This resource demystifies database indexes—special data structures that accelerate query performance. You’ll learn how indexes work under the hood, why they matter, when to use (or skip) them, and how Galaxy’s editor helps you design and test indexes efficiently.
An index is a secondary data structure that the database keeps alongside your table to provide fast lookup paths. Think of it as the index at the back of a textbook: instead of flipping through every page (a full table scan), you jump straight to the topic’s page number.
Most relational databases (PostgreSQL, MySQL, SQL Server, SQLite) default to a B-tree index—an ordered, balanced tree that allows O(log n) searches, inserts, and deletes.
Imagine a phone book:
last_name
is the alphabetical tabs on the side. You flip directly to “S” instead of scanning every page.Indexes provide two primary benefits:
However, they come with costs:
INSERT
, UPDATE
, DELETE
) because the index must be updated.Run the following script in Galaxy’s SQL editor (PostgreSQL syntax):
-- Example dataset: 1M e-commerce orders
CREATE TABLE orders (
id BIGSERIAL PRIMARY KEY,
customer_id INT NOT NULL,
status TEXT NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
total_cents INT NOT NULL
);
-- Insert dummy data (1M rows)
INSERT INTO orders (customer_id, status, total_cents)
SELECT (RANDOM()*100000)::INT, 'placed', (RANDOM()*10000)::INT
FROM generate_series(1, 1000000);
You now have a orders
table with one million rows and no additional indexes.
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;
Observe the output in Galaxy’s results pane. You should see a Seq Scan (sequential scan) taking hundreds of milliseconds, reading the entire table.
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
Re-run the query:
EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 42;
You should now see an Index Scan with a runtime ~10× faster. Galaxy highlights the cost difference side-by-side if you use its Explain Diff feature.
If your application often filters on status
and customer_id
, you may create a composite index:
CREATE INDEX idx_orders_customer_status
ON orders(customer_id, status);
Order matters: The index can be used for queries on (customer_id)
or (customer_id, status)
, but not for status
alone.
A covering index stores additional columns so the database can satisfy the query entirely from the index without touching the table (a “index-only scan”).
CREATE INDEX idx_cover_orders
ON orders(customer_id)
INCLUDE (total_cents, created_at);
For reporting queries that only need these columns, I/O drops further.
Measure performance of this query before and after creating a covering index:
SELECT customer_id, total_cents, created_at
FROM orders
WHERE customer_id = 123
ORDER BY created_at DESC
LIMIT 10;
Document your findings in a Galaxy Notebook cell.
The B-tree splits keys into ordered pages on disk. Each page stores pointers to child pages or row positions (CTID). The tree stays balanced so that the height is minimal (usually 2–4 levels), keeping lookup time logarithmic.
Choose the type that matches your query patterns.
is_active
).In PostgreSQL:
SELECT *
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC LIMIT 10;
Drop indexes with idx_scan = 0
for long periods.
Bloat accumulates from dead rows. Schedule VACUUM
or REINDEX CONCURRENTLY
.
Use idx_
__
to keep things readable.
CREATE INDEX
statements referencing your schema.Past a point, write overhead outweighs read gains. Index only what you query.
The planner may still choose a sequential scan if statistics say it’s cheaper (e.g., expecting many rows). Use ANALYZE
to refresh stats.
It does. Put the most selective, commonly filtered column first.
EXPLAIN ANALYZE
– Is your index being used?pg_stat_all_tables
– Are stats up-to-date?SET enable_seqscan = off;
temporarily to test impact.customer_id, created_at
to prevent duplicate orders made at the exact same timestamp.events(timestamp, payload)
table with 100 M rows of time-series data. Measure storage savings.LOWER(email)
, create a function-based index to speed it up.EXPLAIN ANALYZE
and spreadsheet the results.EXPLAIN ANALYZE
—never assume.