Configuring Debezium for PostgreSQL CDC involves preparing the database for logical replication, deploying a Debezium PostgreSQL connector on Kafka Connect, and tuning properties so row-level changes are streamed reliably into Kafka topics for downstream consumption.
Configuring Debezium for PostgreSQL Change Data Capture
Learn step-by-step how to turn a regular PostgreSQL database into a real-time data stream using Debezium, Kafka Connect and logical replication.
Debezium is an open-source change data capture (CDC) platform that captures row-level changes from databases and publishes them to Apache Kafka topics. By reading the database’s transaction log (WAL for PostgreSQL) Debezium emits every INSERT
, UPDATE
and DELETE
as an immutable event, enabling downstream systems to stay in sync in real time without intrusive polling jobs or bulk ETL pipelines.
PostgreSQL is one of the most popular operational databases in the world. Combining it with Debezium unlocks a range of use-cases:
Unlike trigger-based approaches, Debezium imposes minimal overhead and guarantees exactly-once delivery semantics when configured with Kafka’s transactional producer API.
__consumer_offsets
and Debezium topicspostgresql.conf
# /var/lib/postgresql/data/postgresql.conf
wal_level = logical
max_replication_slots = 4 # at least 1 for Debezium
max_wal_senders = 4
shared_preload_libraries = 'pg_stat_statements'
Reload the configuration:
SELECT pg_reload_conf();
pg_hba.conf
Add a replication entry so Kafka Connect can create a replication slot:
host replication debezium_user 10.0.0.0/16 md5
CREATE ROLE debezium_user WITH REPLICATION LOGIN PASSWORD '********';
PostgreSQL 10+ ships with the pgoutput
plugin. Debezium supports both pgoutput
and wal2json
. The native plugin is preferred unless you’re on an earlier version.
services:
connect:
image: debezium/connect:2.6
environment:
- BOOTSTRAP_SERVERS=kafka:29092
- GROUP_ID=1
- CONFIG_STORAGE_TOPIC=connect_configs
- OFFSET_STORAGE_TOPIC=connect_offsets
- STATUS_STORAGE_TOPIC=connect_statuses
ports:
- 8083:8083
volumes:
- ./plugins:/kafka/connect/plugins # optional extra plugins
The Debezium image already contains the debezium-connector-postgres
JAR.
Post JSON to POST /connectors
:
{
"name": "inventory-postgres-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "db",
"database.port": "5432",
"database.user": "debezium_user",
"database.password": "********",
"database.dbname": "inventory",
"database.server.name": "pg1",
"slot.name": "debezium_slot",
"publication.autocreate.mode": "filtered",
"plugin.name": "pgoutput",
"table.include.list": "public.customers,public.orders",
"tombstones.on.delete": "false",
"snapshot.mode": "initial",
"poll.interval.ms": "1000",
"topic.creation.default.replication.factor": 3,
"topic.creation.default.partitions": 6
}
}
Upon registration Debezium will:
SELECT * FROM pg_replication_slots;
)pg1.public.customers
slot.name
Name of PostgreSQL replication slot. Avoid auto-generated names so you can re-attach a connector after failure without losing WAL.
publication.autocreate.mode
filtered
automatically creates a publication containing only whitelisted tables. If you manage publications manually, set it to disabled
.
snapshot.mode
initial
– take a blocking consistent snapshot before streaminginitial_only
– snapshot, then stop (no streaming)exported
– leverages pg_export_snapshot()
for minimal lockingnever
– start streaming from current LSN, useful when log retention guarantees consistencyplugin.name
Must match the logical decoding plugin loaded by PostgreSQL. Common values: pgoutput
, wal2json
.
Set the following to achieve exactly-once semantics:
"producer.override.enable.idempotence": "true",
"producer.override.acks": "all",
"producer.override.transaction.timeout.ms": 900000,
"producer.override.max.in.flight.requests.per.connection": 1
source-record-poll-total
– WAL events per pollsnapshot-running
, streaming-running
number-of-events-sent
– emitted Kafka recordsSELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained
FROM pg_replication_slots;
Large retained WAL indicates slow connector or downstream backlog. Increase Kafka throughput or add partitions.
Debezium emits schema
fields with every change event and separate schema-changes
topics. Consumers can adapt in near real time. For incompatible changes (e.g., column drop) adopt a double-write pattern: create a new table, migrate traffic, then exclude the old table in connector config.
REPLICATION
but no DDL privileges.sslmode=require
in connector config if PostgreSQL is TLS-enabled.Assume an e-commerce platform wants to keep its product catalog in Elasticsearch. Instead of periodic SQL dumps, Debezium streams row updates directly. A Kafka Connect sink connector indexed to Elasticsearch ensures search results are updated within milliseconds of the database commit—no CRON jobs, no full reloads.
While Debezium operates at the replication layer, you can still write SQL in an IDE like Galaxy to verify replication slot statistics, confirm triggerless CDC and inspect connector heart-beat tables. Galaxy’s autocomplete and sharing features help teams maintain these diagnostic queries centrally.
Configuring Debezium for PostgreSQL CDC is primarily about enabling logical replication, granting minimal credentials, and carefully setting connector properties. Once in place, you gain a robust event stream that can power analytics, microservices and search—all without touching application code.
Building modern data platforms often requires near real-time replication of database changes into analytical stores, caches, or microservices. Debezium provides a low-latency, reliable way to extract PostgreSQL changes without intrusive triggers or full-table scans, enabling architectures like event sourcing and data meshes.
No. Changing wal_level
from replica
to logical
requires a restart, but this can be performed during a regular maintenance window. Afterward, Debezium can be added without further downtime.
ALTER TABLE
?Yes. Debezium emits DDL events to a special schema-changes
topic, enabling consumers to adapt automatically or trigger migrations.
As long as WAL files covering the downtime are retained and the replication slot remains, restarting the connector will fast-forward through missed LSNs. Ensure wal_keep_size
or archiving covers your maximum outage window.
Yes. You can run the replication-slot diagnostics queries shown above inside Galaxy’s SQL editor, save them in a shared Collection, and let teammates rerun or endorse them for operational checks.