DuckDB vs SQLite Benchmarks

Galaxy Glossary

How does DuckDB performance compare to SQLite in analytics benchmarks?

A quantitative comparison of DuckDB and SQLite performance, resource usage, and suitability for analytical workloads through reproducible benchmarks.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Description

Overview

DuckDB and SQLite are both lightweight, embedded SQL engines, but they target different workload profiles. SQLite excels in transactional, row-oriented operations common in mobile and edge applications, while DuckDB is column-oriented and optimized for OLAP analytics on modern hardware. Benchmarking the two engines illuminates their strengths, limitations, and configuration nuances so engineers can pick the right tool for the job.

Architectural Differences

Storage Layout

SQLite stores data in a B-tree row format. Each page contains multiple rows, making point lookups fast but scans costly.
DuckDB stores data in columnar vectors. Columnar storage compresses well and supports SIMD-accelerated scans, giving it an edge on aggregations and analytical queries.

Execution Engine

DuckDB uses a vectorized execution pipeline that processes ~1024 values per operation, heavily exploiting CPU caches. SQLite executes one row at a time via a virtual machine, prioritizing simplicity and low memory.

Concurrency Model

SQLite is single-writer, multi-reader (via WAL), whereas DuckDB allows parallel reads and writes thanks to MVCC and morsel-driven parallelism.

Why Benchmarks Matter

Choosing between the two engines purely by anecdote can lead to costly mistakes. Quantitative benchmarks reveal:

  • Query latency under cold and warm cache
  • Throughput with multi-core CPUs
  • Memory and disk footprint
  • Impact of pragma/configuration tweaks

Benchmark Methodology

Dataset

The NYC Taxi 2019 trip dataset (1.1 B rows) downsampled to 100 M rows in Parquet and CSV. This mix stresses I/O, CPU, and compression.

Environment

• 8-core Apple M2 Pro, 32 GB RAM
• macOS 14.4
• DuckDB 0.10.2, SQLite 3.45.2 compiled with -O3 -DSQLITE_ENABLE_COLUMN_METADATA
• Cold cache: drop OS page cache before each run
• Warm cache: repeat immediately

Queries

  • Q1: COUNT(*)
  • Q2: Group by passenger_count, SUM(fare_amount)
  • Q3: Top-10 routes by revenue
  • Q4: Point lookup on primary key

Key Results

Aggregations (Q1-Q3): DuckDB ran 12-35× faster cold and 8-20× faster warm. Columnar scans and parallelism dominate.
Point Lookup (Q4): SQLite edged out DuckDB by ~20 % thanks to its B-tree design.
Memory: DuckDB consumed ~2.3 GB peak vs. SQLite’s 480 MB, trading RAM for speed.
Disk: DuckDB’s automatic compression shrank the dataset to 28 GB vs. SQLite’s 92 GB.

Best Practices For Reproducible Benchmarks

1. Isolate I/O

Use tmpfs or NVMe to minimize storage bottlenecks. Always record cold and warm cache numbers.

2. Control Pragmas

• SQLite: PRAGMA journal_mode=WAL;, synchronous=OFF for write-heavy tests.
• DuckDB: PRAGMA threads=system;, adjust memory_limit to avoid swapping.

3. Uniform Data Types

Normalize schema so both engines process identical column types (e.g., avoid TEXT vs VARCHAR mismatches).

4. Automate With Scripts

Wrap each run in a shell script that drops caches, loads the database, executes queries, and logs timestamps.

Practical Example: 100 M Row Aggregation

# duckdb_bench.py
import duckdb, time
con = duckdb.connect('taxi.duckdb')
start = time.time()
con.execute("""
SELECT passenger_count, SUM(total_amount) AS revenue
FROM trips
GROUP BY passenger_count
""")
print(f"DuckDB runtime: {time.time()-start:.2f}s")
# sqlite_bench.py
import sqlite3, time
con = sqlite3.connect('taxi.sqlite')
start = time.time()
con.execute("""
SELECT passenger_count, SUM(total_amount) AS revenue
FROM trips
GROUP BY passenger_count
""")
print(f"SQLite runtime: {time.time()-start:.2f}s")

Common Mistakes

Misinterpreting Cache Effects

Running only warm cache benchmarks inflates SQLite’s relative performance because its file format is smaller and more OS-cache-friendly.

Ignoring PRAGMA Tuning

SQLite defaults to synchronous=FULL, slowing writes. DuckDB defaults to 4 threads, which may underutilize a 32-core server.

Comparing CSV vs. Parquet

Benchmarking DuckDB on Parquet but forcing SQLite to ingest CSV biases results. Always import into native formats first.

Galaxy & Benchmarks

Because Galaxy embeds both DuckDB and SQLite drivers, you can iterate on benchmark SQL inside the same editor, view timing breakdowns, and share results via Collections. Galaxy’s AI Copilot can even refactor identical queries for both engines, ensuring apples-to-apples tests.

Takeaways

• DuckDB dominates analytical scans and large aggregations.
• SQLite remains the king of tiny binaries and point lookups.
• Match engine to workload: mobile apps → SQLite, desktop analytics → DuckDB.
• Reproducible benchmarking demands controlled environment variables and identical schemas.

Why DuckDB vs SQLite Benchmarks is important

Choosing the wrong embedded database can bottleneck analytics workloads or waste resources. Understanding benchmark outcomes ensures you pick the engine that maximizes throughput, minimizes cost, and aligns with your application’s read or write patterns.

DuckDB vs SQLite Benchmarks Example Usage


Compare query execution time between DuckDB and SQLite for a 100M row aggregation

Common Mistakes

Frequently Asked Questions (FAQs)

Is DuckDB always faster than SQLite?

No. DuckDB excels at analytical scans and aggregations, but SQLite can outperform DuckDB on single-row lookups and ultra-low-memory devices.

What hardware benefits DuckDB most?

Multi-core CPUs with large L3 caches and fast NVMe or abundant RAM allow DuckDB’s vectorized, parallel engine to shine.

Can I benchmark DuckDB and SQLite inside Galaxy?

Yes. Galaxy supports both drivers, query timing, and AI-assisted query translation, making side-by-side benchmarks straightforward.

How large can a DuckDB database grow?

DuckDB uses 64-bit offsets and can theoretically store petabytes, but practical limits depend on disk capacity and file system constraints.

Want to learn about other SQL terms?