Using Presto’s Hive or native Delta connectors to read Delta Lake transactionally consistent tables stored on object storage or HDFS.
Learn how to unlock the power of Delta Lake’s ACID-compliant parquet tables from the speed of Presto’s distributed SQL engine.
Delta Lake is an open-source storage format that adds ACID transactions, schema evolution, versioned data, and time travel to parquet files stored on object storage or HDFS. A Delta table is simply a directory of parquet data files accompanied by a _delta_log
directory that tracks every commit in JSON and parquet checkpoints. By reading the log before scanning data files, an engine can reconstruct a snapshot-consistent view of the table without locking the underlying files.
Presto (and its fork Trino) shines as a federated MPP SQL engine capable of interactive analytics at petabyte scale. Being able to query Delta tables in Presto means you can:
There are two main integration patterns, depending on which Presto distribution you use:
Recent versions of trino
(formerly PrestoSQL) ship a delta
connector that reads the _delta_log
directly. Configuration is as simple as:
[delta]
connector.name=delta-lake
hive.metastore.uri=thrift://hms:9083
Point the catalog at a Hive metastore (or AWS Glue) that tracks the table location. The connector handles snapshot resolution, schema, partition pruning, and predicate pushdown automatically.
The original PrestoDB line does not yet include a dedicated Delta connector, but you can enable Delta support in the Hive connector by flipping a flag:
[hive]
connector.name=hive-hadoop2
delta.enabled=true
When delta.enabled=true
, Presto’s Hive split enumerator recognizes Delta directories, parses the JSON log, and exposes the table just like any Hive table.
spark.read.parquet('s3://raw/events')\
.write.format('delta')\
.partitionBy('event_date')\
.save('s3://gold/delta/events')
CREATE EXTERNAL TABLE delta_events
USING DELTA
LOCATION 's3://gold/delta/events';
etc/catalog/delta.properties
or hive.properties
, ensure Hive metastore URI, S3 credentials, and (optionally) delta.hide-non-delta-lake-tables=true
are set.SELECT event_type, count(*) AS n
FROM delta.delta_events
WHERE event_date = DATE '2024-04-15'
GROUP BY event_type
ORDER BY n DESC;
Presto will:
_delta_log/*.json
commit.OPTIMIZE
or VACUUM
in Spark to compact small files and remove tombstones; Presto scans fewer objects.event_date
). Avoid over-partitioning; Presto already pushes predicates into parquet.DELTA_LAKE_CONCURRENT_READS
. Trino’s connector can parallelize JSON log parsing; set this session property to speed up huge commit logs.GENERATE symlink_format_manifest
. If you run very old Presto versions, point Hive at the manifest to avoid Delta-specific logic.Delta Lake’s history is fully accessible from Presto. You can query a snapshot as of a particular version or timestamp:
-- Snapshot by version
SELECT *
FROM delta.delta_events FOR VERSION AS OF 325;
-- Snapshot by timestamp
SELECT *
FROM delta.delta_events
FOR TIMESTAMP AS OF TIMESTAMP '2024-04-10 00:00:00';
This is invaluable for debugging data issues, auditing, or building slowly changing dimension logic without physically copying data.
When Delta tables are updated by Spark, Presto automatically sees new snapshots; no refresh is required. Starting Trino 410, you can even DELETE
, UPDATE
, and MERGE
directly from Presto—making the warehouse fully bi-directional.
SELECT
fails with Table not found, ensure you created an external table or enabled delta.register-table-on-read
.delta.allow-high-protocol=true
(read-only).Because Galaxy speaks native Presto over JDBC, you can add your Presto cluster as a connection once and instantly explore every Delta table through Galaxy’s AI copilot, autocomplete, and shared collections:
SELECT * FROM
and Galaxy suggests Delta tables plus column docs.events_last_30_days
query so teammates stop copy-pasting SQL in Slack.Querying Delta tables from Presto brings together the best of both worlds: ACID data lakes and blazing-fast distributed SQL. With a few configuration tweaks you can run sub-second analytics, join Delta with any other source, enjoy point-in-time time-travel queries, and surface everything through developer-focused tools like Galaxy.
Enable Delta in your Presto catalog today and unlock a unified, scalable lakehouse that just speaks SQL.
Modern data platforms often land raw data in object storage as Delta Lake while interactive workloads rely on Presto for federated SQL. Bridging the two enables analysts and applications to leverage ACID guarantees without duplicating data or moving it into specialized warehouses. Mastering this integration eliminates silos, reduces latency, and supports advanced features such as time-travel debugging and in-place updates—all critical capabilities for data engineers who build reliable, self-service analytics layers.
Recent Trino versions (410+) and Starburst Enterprise allow INSERT
, DELETE
, UPDATE
, and MERGE
statements on Delta tables. PrestoDB remains read-only today.
No. The Delta connector can use the filesystem directly if register-table-on-read=true
. However, a metastore simplifies discovery, permissions, and statistics.
Galaxy’s desktop SQL editor connects to Presto over JDBC, autocompletes Delta tables, suggests column docs, and lets teams share reviewed queries in Collections—speeding up collaboration.
Use Trino 356+ or Starburst 364+ for stable Delta read support, and Trino 410+ for write operations. For PrestoDB, run 0.282 or later with delta.enabled=true
.