How to Tune Queries in ParadeDB

Galaxy Glossary

How do I tune ParadeDB queries for faster vector search?

ParadeDB query tuning trims latency for ANN searches by adjusting planner settings, index parameters, and SQL structure.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

Description

Table of Contents

Why does ParadeDB need query tuning?

ParadeDB builds on PostgreSQL, so the planner still decides between sequential scans, IVFFlat index scans, or HNSW. Default settings assume small data; large vector stores need manual tweaks for millisecond searches.

How do I inspect a slow ParadeDB query?

Start with EXPLAIN (ANALYZE, BUFFERS). It shows whether the call uses an IVFFlat/HNSW index or falls back to a sequential scan. Look for Seq Scan lines; they indicate missing indexes or bad parameters.

Example

EXPLAIN (ANALYZE, BUFFERS)SELECT id, nameFROM productsORDER BY embedding <-> '[0.12,0.03, … ]' LIMIT 10;

How do I force the planner to use the ANN index?

Set planner flags before running the query or in the session:

SET enable_seqscan TO off;SET enable_bitmapscan TO off;

These nudge PostgreSQL toward the IVFFlat/HNSW index without touching global config.

Which ParadeDB index parameters matter most?

IVFFlat: tune lists on creation (CREATE INDEX … lists = 2048) and ivfflat.probes at query time (SET ivfflat.probes TO 10;). More lists and probes raise recall but slow reads; measure both.

HNSW: tweak M and ef_construction at build, hnsw.ef_search per query. Higher values improve accuracy with extra CPU.

How do I batch similar ANN lookups?

Wrap multiple target vectors in a CTE and join once. This avoids repeated index startups.

WITH target(vec) AS ( VALUES ('[0.1, … ]'::vector), ('[0.2, … ]'::vector))SELECT p.id, t.vecFROM target t, LATERAL ( SELECT id FROM products ORDER BY embedding <-> t.vec LIMIT 5) p;

What Postgres-level settings affect ParadeDB?

Increase shared_buffers, work_mem, and maintenance_work_mem to keep vector pages hot and speed index builds. Disable track_io_timing in prod if not debugging; it adds overhead.

Best practices recap

  • Always EXPLAIN ANALYZE first
  • Create the right ANN index (IVFFlat for fast writes, HNSW for higher recall)
  • Raise ivfflat.probes or hnsw.ef_search only as needed
  • Batch vector searches with CTEs
  • Pin hot vectors with generous shared buffers

Why How to Tune Queries in ParadeDB is important

How to Tune Queries in ParadeDB Example Usage


-- Find top 3 similar products to a customer’s last purchase
WITH last_item AS (
  SELECT p.embedding
  FROM orders o
  JOIN orderitems oi ON oi.order_id = o.id
  JOIN products p  ON p.id = oi.product_id
  WHERE o.customer_id = 42
  ORDER BY o.order_date DESC
  LIMIT 1
)
SELECT pr.id, pr.name, pr.price
FROM last_item li, LATERAL (
  SELECT id, name, price
  FROM products pr
  ORDER BY pr.embedding <-> li.embedding LIMIT 3
) pr;

How to Tune Queries in ParadeDB Syntax


-- IVFFlat index creation
CREATE INDEX products_embedding_ivfflat
ON products USING ivfflat (embedding vector_l2_ops)
WITH (lists = 2048);

-- HNSW index creation
CREATE INDEX products_embedding_hnsw
ON products USING hnsw (embedding vector_l2_ops)
WITH (M = 16, ef_construction = 200);

-- Session-level tuning before a query
SET enable_seqscan = off;        -- force index usage
SET ivfflat.probes = 10;         -- accuracy vs. speed
SET hnsw.ef_search = 100;        -- accuracy vs. speed

-- Tuned query
SELECT id, name, price
FROM products
ORDER BY embedding <-> '[0.12,0.03,…]' LIMIT 10;

Common Mistakes

Frequently Asked Questions (FAQs)

Does disabling seqscan hurt other queries?

No, only the current session is affected. Reset with SET enable_seqscan = on;.

Should I store vectors in a separate table?

Yes if they change frequently. Isolating updates avoids index bloat on the main product catalog.

When do I choose HNSW over IVFFlat?

Choose HNSW for read-heavy workloads needing high recall (>0.95). IVFFlat suits mixed read/write or smaller datasets.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.