ParadeDB is an open-source PostgreSQL extension that adds high-performance vector similarity search and hybrid retrieval to your database.
ParadeDB integrates ANN vector search, hybrid BM25 ranking, and RAG-ready functions directly into Postgres, letting you build AI & search features without extra infra.
Run CREATE EXTENSION paradedb;
after copying the compiled paradedb
files into $PGHOME/lib
and updating shared_preload_libraries
. Restart the server to load the extension.
Use CREATE INDEX ... USING paradedb_ivfflat
on a vector
column.Choose dim
, metric
, and lists
parameters for speed-accuracy trade-offs.
CREATE INDEX idx_products_embeddings
ON products USING paradedb_ivfflat (embedding vector_l2(1536))
WITH (lists = 100);
Use the <>
operator with ORDER BY
and LIMIT
for nearest neighbors, or call paradedb.knn()
for extra options like filtering.
SELECT id, name, price
FROM products
ORDER BY embedding <> paradedb.vector('\[0.12, ...]')
LIMIT 5;
Yes.paradedb.hybrid_rank()
combines BM25 scores from tsvector
columns with vector distance, returning a single relevance score.
SELECT p.id, p.name
FROM products p
CROSS JOIN LATERAL paradedb.hybrid_rank(p.embedding, p.search_tsv, 'wireless headphones') r
ORDER BY r.score DESC
LIMIT 10;
Start with 1–2K vectors per list for IVFFlat. Always VACUUM ANALYZE
after bulk loads. Store vectors as FLOAT4[]
for smaller disk usage.
Product recommendation, semantic customer support search, AI-powered dashboards, and real-time personalization are typical ParadeDB workloads inside ecommerce stacks.
.
Yes, the engine is battle-tested at scale, but always benchmark on your workload and enable proper monitoring.
Not yet. Current releases focus on CPU-optimized ANN algorithms that work on commodity hardware.
No. Each vector
column has a fixed dimension declared at creation time. Use separate columns or tables for other sizes.