Optimize ParadeDB queries by adding the right indexes, tuning index parameters, and inspecting query plans to reduce latency and CPU usage.
Slow queries usually scan the entire table or use a sub-optimal index. Use EXPLAIN (ANALYZE, BUFFERS)
to reveal whether a sequential scan, high shared hit count, or CPU-bound distance calculation causes the delay.
Create an ivfflat
or hnsw
index on the embedding
column.ParadeDB offloads distance computation to these indexes, cutting query time from seconds to milliseconds for large product catalogs.
Use CREATE INDEX idx_prod_vec ON products USING ivfflat (embedding vector_cosine_ops) WITH (lists=100);
for balanced recall and speed on ~1 M rows.
Run EXPLAIN (ANALYZE, BUFFERS)
before and after adding an index.Look for an Index Scan using idx_prod_vec
node and verify Total Runtime
drops.
Raise lists
when recall is too low; lower it when latency matters more. Start with lists = sqrt(table_rows)
and benchmark.
Combine a vector index on embedding
with a B-tree index on stock
or price
to filter first, then KNN search.Use materialized views for popular queries and schedule VACUUM ANALYZE
.
Parameterized SQL prevents replans and SQL injection, keeping the planner’s statistics accurate.
Disable sequential scans with SET enable_seqscan = off;
. Remember to revert the setting afterward.
Outdated stats mislead the planner. Run ANALYZE products;
after bulk loads so ParadeDB picks the optimal path.
.
Yes. ParadeDB builds on pgvector. Ensure the vector
extension is installed before creating embedding columns.
Yes. Use a GIN
index for full-text search on name
, then order the filtered set by vector distance.
Run VACUUM (ANALYZE)
daily for active product tables to keep bloat low and statistics fresh.