ParadeDB best practices ensure fast, accurate vector search in PostgreSQL through correct extension setup, indexing strategy, and query design.
ParadeDB layers lightning-fast vector and hybrid search on top of PostgreSQL. It reuses familiar SQL while adding approximate-nearest-neighbor (ANN) indexes such as IVFFLAT and HNSW. Following best practices avoids slow queries, bloated indexes, and downtime.
Run CREATE EXTENSION IF NOT EXISTS paradedb;
in a superuser session. Verify with \dx
. Store the extension in postgresql.conf
if you want it pre-loaded at startup (shared_preload_libraries = 'paradedb'
).
Yes. ParadeDB builds on pgvector
. Install it first, then add ParadeDB. Keep both extensions at identical minor versions to avoid operator signature mismatches.
Use IVFFLAT for < 1 M rows and frequent writes; choose HNSW for > 1 M rows or read-heavy workloads. Always create indexes with a filter to cut the search space (WHERE deleted_at IS NULL
) to improve recall.
Store vectors in a vector(n)
column sized to your embedding model (e.g., 1536 for OpenAI text-embedding-3-small). Compress rows with ALTER TABLE ... SET (toast.compress = 'lz4')
to shrink cold storage without harming performance.
Always combine similarity search with relational filters. Set SET ivfflat.probes = k
(1–20) per session to balance accuracy and latency. Fetch only the columns you need—especially avoid SELECT *
on large vector tables.
SELECT id, name, price
FROM products
WHERE stock > 0
ORDER BY embedding <-> $1
LIMIT 5;
Recluster IVFFLAT indexes after large inserts: ALTER INDEX products_embedding_ivf ATTACH PARTITION ...
then REINDEX
. For HNSW, set maintenance_work_mem
generously (≥4 GB) during build to avoid disk spills.
Every time the table grows by ~20 %. Drop and recreate the index in a transaction or use a concurrently-built shadow index to stay online.
Grant USAGE
on the extension schema only to roles that need vector search. Revoke EXECUTE
on experimental operators in production.
Yes. Create identical vector indexes on each partition or a global index once support lands in your Postgres version.
Absolutely. ParadeDB lets you ORDER BY ts_rank DESC, embedding <-> $1
to combine lexical and semantic relevance.
Create a new embedding_v2
column, backfill in batches, build a new index, then swap columns and drop the old index in a single transaction.