This guide demystifies vector databases—why they matter, how they work, and when to use them. You’ll learn core concepts like embeddings and approximate-nearest-neighbor indexing, build a hands-on semantic-search demo, and discover best practices plus tips for querying pgvector straight from Galaxy’s SQL editor.
pgvector in PostgreSQL.A vector is an ordered list of numbers that represents an item’s position in a multi-dimensional space. In machine learning, vectors typically come from an embedding model—for example, turning a sentence like “Galaxy is a modern SQL editor” into a 1 × 1536 float array.
Vectors let us compare complex objects by geometric distance instead of exact equality. That unlocks applications such as:
An embedding function (BERT, OpenAI, Cohere, etc.) maps raw data—text, images, audio—to dense vectors. The choice of model affects the quality and dimensionality (e.g., 384 vs 1536 dimensions).
Common distance functions include:
cosine – angle between vectors (scale-invariant).euclidean – straight-line (L2) distance.dot_product – unnormalized similarity (good for transformers).Your database must support the metric you plan to query with.
Brute-force scanning every vector is O(N · d). ANN indexes such as HNSW (Hierarchical Navigable Small World) or IVF (Inverted File) trade tiny accuracy loss for 100-1000× speedups—critical when your table has millions of rows.
A vector database typically stores:
id UUID / BIGINT
embedding VECTOR(1536)
metadata JSONB (title, url, etc.)
Metadata enables post-filtering (e.g., date range, user_id) before similarity search.
Ask two questions:
If yes to either, a vector index—inside Postgres or a purpose-built system like Pinecone, Weaviate, or Milvus—pays off.
The pgvector extension adds a vector column type and ANN indexes to PostgreSQL, letting you stay inside a familiar relational stack.
# Homebrew example (macOS)
brew install pgvector
psql -d mydb -c "CREATE EXTENSION IF NOT EXISTS vector;"
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding VECTOR(1536)
);
-- Add an HNSW index for fast cosine search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
import openai, psycopg2, json
openai.api_key = "YOUR_OPENAI_KEY"
conn = psycopg2.connect(dsn="postgres://@localhost/mydb")
cur = conn.cursor()
texts = ["Galaxy boosts SQL productivity", "Vector DB powers semantic search"]
for t in texts:
resp = openai.Embedding.create(model="text-embedding-3-small", input=t)
vec = resp["data"][0]["embedding"]
cur.execute("INSERT INTO documents (content, embedding) VALUES (%s, %s)", (t, vec))
conn.commit()
-- 1. Obtain the query vector (Python or your LLM of choice)
SELECT content, embedding <=> '[0.123, 0.456, ...]' AS distance
FROM documents
ORDER BY distance LIMIT 5;
<=> is pgvector’s operator for cosine distance. Smaller values = closer matches.
documents.LIMIT 1? Why?Because pgvector lives inside PostgreSQL, you can connect your database to Galaxy in seconds. Benefits:
vector columns and operators (<=>, <#>).-- Galaxy Snippet: Top K similar documents
SELECT id, content, embedding <=> :query_vec AS distance
FROM documents
WHERE created_at > NOW() - INTERVAL '30 days'
ORDER BY distance
LIMIT :k;
You can share the snippet via a Galaxy Collection—no more copying raw SQL into Slack.
pgvector brings vectors to Postgres.Next Steps: Add embeddings for your documentation repo, create a semantic “search the docs” route, and watch user support tickets drop.