How to Test Queries in ParadeDB PostgreSQL

Galaxy Glossary

How do I test vector-search queries in ParadeDB?

The test_queries() helper in ParadeDB quickly benchmarks vector-search SQL against a chosen index.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.

Description

Table of Contents

Why should I run test_queries() in ParadeDB?

Use test_queries() to measure recall and latency of your vector index before shipping a semantic-search feature. The function executes a set of embeddings against the target index, captures timing, and returns aggregated metrics.

What is the basic workflow?

1) Load or generate query embeddings. 2) Call test_queries() with the index name, embedding array, and k-nearest neighbors. 3) Inspect the returned table for avg_ms, p95_ms, and recall. 4) Tune index parameters if needed.

How do I install ParadeDB?

Install the extension, then create it in your database: CREATE EXTENSION paradedb; ParadeDB requires PostgreSQL ≥15 and the vector extension.

What parameters does test_queries() accept?

index_name TEXT – the name of the vector index.
query_embeddings VECTOR[] – array of query embeddings.
k INT – number of neighbors to retrieve.
Optional named args mirror the underlying ANN parameters (ef_search, metric, etc.).

How do I interpret the result?

The function returns one row per query plus an aggregate row labeled _summary. Key columns: avg_ms, p95_ms, recall, rows_scanned. Aim for sub-50 ms avg latency and ≥0.9 recall.

Should I test on production data?

Yes—clone a production snapshot into a staging database. Accurate cardinality and distribution are essential for realistic benchmarks.

How can I speed up slow queries?

Lower ef_search, increase index M, or add a pre-filter (e.g., product category) before the vector condition. Re-run test_queries() after each change.

Why How to Test Queries in ParadeDB PostgreSQL is important

How to Test Queries in ParadeDB PostgreSQL Example Usage


-- Benchmark semantic product search
SELECT *
FROM paradedb.test_queries(
    index_name       => 'products_name_embedding_idx',
    query_embeddings => ARRAY[
        pgml.embed('usb-c charger')::vector,
        pgml.embed('gaming headset')::vector
    ],
    k => 10
) AS t;
-- Expect a summary row with avg_ms and recall columns

How to Test Queries in ParadeDB PostgreSQL Syntax


SELECT paradedb.test_queries(
    index_name       => 'products_name_embedding_idx',
    query_embeddings => ARRAY[
       pgml.embed('wireless mouse')::vector,
       pgml.embed('mechanical keyboard')::vector
    ],
    k                => 5
);

Common Mistakes

Frequently Asked Questions (FAQs)

Does test_queries() modify my data?

No, it only reads from the index and records timing in memory.

Can I test multiple indexes at once?

Call the function in a UNION ALL query, passing each index name separately, then compare summary rows.

How many query embeddings should I use?

Twenty to fifty diverse queries usually provide stable latency and recall estimates.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
Oops! Something went wrong while submitting the form.