How to Self-Host ParadeDB in PostgreSQL

Galaxy Glossary

How do I self-host ParadeDB inside PostgreSQL?

Self-hosting ParadeDB lets you run vector search inside your own PostgreSQL instance for full control and lower latency.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

Description

Table of Contents

Why choose self-hosting over ParadeDB Cloud?

Self-hosting keeps embeddings and user data inside your VPC, avoids SaaS fees, and lets you tune PostgreSQL exactly for your workload.

Which files do I need to run?

ParadeDB ships a single Docker image that bundles PostgreSQL, the vectors extension, and ParadeDB’s SQL procedures. No extra build steps.

How do I spin up ParadeDB with Docker Compose?

Create docker-compose.yml and run docker compose up -d. ParadeDB starts on port 5432 with persistent volume mapping for /var/lib/postgresql/data.

services:
paradedb:
image: paradedb/paradedb:latest
ports:
- "5432:5432"
environment:
POSTGRES_PASSWORD: secret
volumes:
- pgdata:/var/lib/postgresql/data
volumes:
pgdata:

Need SSL?

Mount your certs and set POSTGRES_SSL=on. ParadeDB uses the standard PostgreSQL parameters.

Can I install ParadeDB in an existing cluster?

Yes. Copy paradedb--*.sql files to $PGDATA/extension or use CREATE EXTENSION if the package is in share/extension.

What is the minimal SQL to enable ParadeDB?

Connect as superuser, then:

CREATE EXTENSION IF NOT EXISTS vectors;
CREATE EXTENSION IF NOT EXISTS paradedb;

How do I store product embeddings?

Add a vector column to products and create an ivfflat index.

ALTER TABLE products ADD COLUMN embedding vector(384);
CREATE INDEX idx_products_embedding ON products USING ivfflat (embedding vector_l2_ops) WITH (lists = 100);

How do I query for similar products?

Use ParadeDB’s helper to embed the search phrase, then sort by L2 distance.

SELECT id, name
FROM products
ORDER BY embedding <=> paradedb_embed('lightweight laptop')
LIMIT 5;

How do I back up the instance?

Because ParadeDB is plain PostgreSQL, use pg_dump, pg_basebackup, or any managed-storage snapshot strategy.

Best practice: tune work_mem & shared_buffers

Set shared_buffers to 25% RAM and work_mem to at least 32MB to avoid temporary files during ANN queries.

Common mistakes and fixes

Missing ivfflat analyze: run VACUUM ANALYZE products; after bulk inserts so the planner uses the index.

Too few lists: a low lists setting slows recall. Reindex with 100-200 lists for >1M vectors.

Why How to Self-Host ParadeDB in PostgreSQL is important

How to Self-Host ParadeDB in PostgreSQL Example Usage


-- Find customers who bought products similar to a search phrase
WITH similar_products AS (
  SELECT id
  FROM products
  ORDER BY embedding <=> paradedb_embed('gaming laptop')
  LIMIT 20
)
SELECT c.id, c.name, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o  ON o.customer_id = c.id
JOIN orderitems oi ON oi.order_id   = o.id
WHERE oi.product_id IN (SELECT id FROM similar_products)
GROUP BY c.id, c.name
ORDER BY total_spent DESC
LIMIT 5;

How to Self-Host ParadeDB in PostgreSQL Syntax


-- Enable extension in a fresh ParadeDB container
CREATE EXTENSION IF NOT EXISTS vectors;
CREATE EXTENSION IF NOT EXISTS paradedb;

-- Embed products and index
ALTER TABLE products ADD COLUMN embedding vector(384);
CREATE INDEX idx_products_embedding ON products USING ivfflat (
    embedding vector_l2_ops
) WITH (lists = 100);

-- Query nearest neighbours with ParadeDB helper
SELECT id, name, price
FROM products
ORDER BY embedding <=> paradedb_embed('noise-cancelling headphones')
LIMIT 10;

Common Mistakes

Frequently Asked Questions (FAQs)

Can I run ParadeDB on Postgres 16?

Yes. The Docker image is rebuilt for every minor release. For manual installs, compile the vectors extension against Postgres 16.

Does ParadeDB support cosine similarity?

Absolutely. Store embeddings with vector_cosine_ops and use embedding <#> paradedb_embed('query').

Is ParadeDB open-source?

Yes, GPL-licensed on GitHub at paradedb/paradedb.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.