ParadeDB adds vector-based data types to PostgreSQL for fast similarity search across text, images, and other embeddings.
ParadeDB ships a single purpose-built type, vector(dim)
, that stores fixed-length float32 embeddings. The type supports K-nearest-neighbor (KNN) operators for cosine, Euclidean, and inner-product distance.
Create the column with an explicit dimension: vector(768)
for common text models or vector(512)
for images. ParadeDB enforces length at insert time, catching malformed embeddings early.
The Products
table gains an embedding
column to enable semantic search across product descriptions.
Use CREATE INDEX ... USING hnsw
for approximate search or ivfflat
for disk-based recall. Always match the distance metric to your query operator.
CREATE INDEX products_emb_idx ON Products USING hnsw (embedding vector_l2_ops);
builds an in-memory graph optimized for L2 distance.
Leverage the KNN syntax: ORDER BY embedding <-> $1 LIMIT 10
. ParadeDB automatically chooses the appropriate operator based on the metric family.
Find the 5 closest products to a customer-supplied vector: SELECT id, name FROM Products ORDER BY embedding <#> $1 LIMIT 5;
Standardize preprocessing steps—tokenization, model version, and normalization—before generating embeddings. Batch index maintenance during off-peak hours to avoid write amplification.
Yes. Use vector_l2_ops
, vector_ip_ops
, or vector_cosine_ops
in the index depending on your needs.
PostgreSQL TOAST automatically handles large vectors, but keep dimensions reasonable (≤3072) to avoid memory pressure.
Standard UPDATE
statements work. Rebuild or re-cluster the HNSW index periodically for optimal accuracy.