How to Use ParadeDB Data Types in PostgreSQL

How do I use ParadeDB's vector data type in PostgreSQL?

ParadeDB adds vector-based data types to PostgreSQL for fast similarity search across text, images, and other embeddings.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Which ParadeDB data types are available?

ParadeDB ships a single purpose-built type, vector(dim), that stores fixed-length float32 embeddings. The type supports K-nearest-neighbor (KNN) operators for cosine, Euclidean, and inner-product distance.

How do I declare a vector column?

Create the column with an explicit dimension: vector(768) for common text models or vector(512) for images. ParadeDB enforces length at insert time, catching malformed embeddings early.

Example: storing product embeddings

The Products table gains an embedding column to enable semantic search across product descriptions.

How do I index ParadeDB vectors?

Use CREATE INDEX ... USING hnsw for approximate search or ivfflat for disk-based recall. Always match the distance metric to your query operator.

Example: creating an HNSW index

CREATE INDEX products_emb_idx ON Products USING hnsw (embedding vector_l2_ops); builds an in-memory graph optimized for L2 distance.

How do I run similarity queries?

Leverage the KNN syntax: ORDER BY embedding <-> $1 LIMIT 10. ParadeDB automatically chooses the appropriate operator based on the metric family.

Example: nearest neighbor for products

Find the 5 closest products to a customer-supplied vector: SELECT id, name FROM Products ORDER BY embedding <#> $1 LIMIT 5;

Best practices for ParadeDB data types

Standardize preprocessing steps—tokenization, model version, and normalization—before generating embeddings. Batch index maintenance during off-peak hours to avoid write amplification.

Why How to Use ParadeDB Data Types in PostgreSQL is important

How to Use ParadeDB Data Types in PostgreSQL Example Usage


-- Add an embedding column to Orders for customer behavior vectors
ALTER TABLE Orders
ADD COLUMN customer_vec vector(128);

-- Find similar orders to a new customer's vector
SELECT id, customer_id, total_amount
FROM   Orders
ORDER  BY customer_vec <#> '\u0003\u0004...'
LIMIT  5;

How to Use ParadeDB Data Types in PostgreSQL Syntax


-- Declare a table with ParadeDB's vector type
CREATE TABLE Products (
    id           SERIAL PRIMARY KEY,
    name         TEXT NOT NULL,
    price        NUMERIC(10,2) NOT NULL,
    stock        INT DEFAULT 0,
    embedding    vector(768)   -- ParadeDB data type
);

-- Build an approximate nearest-neighbor index
CREATE INDEX products_emb_idx
    ON Products USING hnsw (embedding vector_cosine_ops);

-- Query for the 3 most similar products to a given vector
SELECT id, name, price
FROM   Products
ORDER  BY embedding <=> '\u0001\u0002...'
LIMIT  3;

Common Mistakes

Forgetting to specify the dimension: <code>vector</code> without <code>(dim)</code> defaults to <code>vector(1)</code>, causing runtime errors. Always declare the exact size, e.g., <code>vector(768)</code>.
Running similarity queries before creating an index: full table scans degrade performance. Create an HNSW or IVFFLAT index on every frequently queried vector column.