Configuring Debezium for PostgreSQL Change Data Capture (CDC)

How do I configure Debezium for PostgreSQL CDC?

Configuring Debezium for PostgreSQL CDC involves preparing the database for logical replication, deploying a Debezium PostgreSQL connector on Kafka Connect, and tuning properties so row-level changes are streamed reliably into Kafka topics for downstream consumption.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Description

Configuring Debezium for PostgreSQL Change Data Capture

Learn step-by-step how to turn a regular PostgreSQL database into a real-time data stream using Debezium, Kafka Connect and logical replication.

What Is Debezium?

Debezium is an open-source change data capture (CDC) platform that captures row-level changes from databases and publishes them to Apache Kafka topics. By reading the database’s transaction log (WAL for PostgreSQL) Debezium emits every INSERT, UPDATE and DELETE as an immutable event, enabling downstream systems to stay in sync in real time without intrusive polling jobs or bulk ETL pipelines.

Why PostgreSQL + Debezium Matters

PostgreSQL is one of the most popular operational databases in the world. Combining it with Debezium unlocks a range of use-cases:

Real-time data warehousing and analytics
Search index synchronization (e.g., Elasticsearch, OpenSearch)
Cache invalidation or update (Redis, Memcached)
Event-driven microservices that react to database changes

Unlike trigger-based approaches, Debezium imposes minimal overhead and guarantees exactly-once delivery semantics when configured with Kafka’s transactional producer API.

Prerequisites

Software Versions

PostgreSQL ≥ 10 (logical replication included natively)
Apache Kafka ≥ 2.7
Kafka Connect (distributed or standalone) matching Kafka version
Debezium PostgreSQL connector JAR matching Debezium platform version

Infrastructure Checklist

Network connectivity between PostgreSQL and Kafka Connect
Sufficient disk for Kafka __consumer_offsets and Debezium topics
Backup policy—CDC does not replace logical or physical backups

Step 1 – Enable Logical Replication in PostgreSQL

Update `postgresql.conf`

# /var/lib/postgresql/data/postgresql.conf wal_level = logical max_replication_slots = 4 # at least 1 for Debezium max_wal_senders = 4 shared_preload_libraries = 'pg_stat_statements'

Reload the configuration:

SELECT pg_reload_conf();

Configure `pg_hba.conf`

Add a replication entry so Kafka Connect can create a replication slot:

host replication debezium_user 10.0.0.0/16 md5

Create a Replication-Capable User

CREATE ROLE debezium_user WITH REPLICATION LOGIN PASSWORD '********';

Step 2 – Install Logical Decoding Plugin

PostgreSQL 10+ ships with the pgoutput plugin. Debezium supports both pgoutput and wal2json. The native plugin is preferred unless you’re on an earlier version.

Step 3 – Deploy Kafka Connect with Debezium Connector

Docker Compose Snippet

services: connect: image: debezium/connect:2.6 environment: - BOOTSTRAP_SERVERS=kafka:29092 - GROUP_ID=1 - CONFIG_STORAGE_TOPIC=connect_configs - OFFSET_STORAGE_TOPIC=connect_offsets - STATUS_STORAGE_TOPIC=connect_statuses ports: - 8083:8083 volumes: - ./plugins:/kafka/connect/plugins # optional extra plugins

The Debezium image already contains the debezium-connector-postgres JAR.

Step 4 – Register the PostgreSQL Source Connector

Post JSON to POST /connectors:

{ "name": "inventory-postgres-connector", "config": { "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "database.hostname": "db", "database.port": "5432", "database.user": "debezium_user", "database.password": "********", "database.dbname": "inventory", "database.server.name": "pg1", "slot.name": "debezium_slot", "publication.autocreate.mode": "filtered", "plugin.name": "pgoutput", "table.include.list": "public.customers,public.orders", "tombstones.on.delete": "false", "snapshot.mode": "initial", "poll.interval.ms": "1000", "topic.creation.default.replication.factor": 3, "topic.creation.default.partitions": 6 } }

Upon registration Debezium will:

Take a consistent snapshot of the included tables
Create a logical replication slot (SELECT * FROM pg_replication_slots;)
Stream WAL changes into Kafka topics like pg1.public.customers

Key Configuration Properties Explained

`slot.name`

Name of PostgreSQL replication slot. Avoid auto-generated names so you can re-attach a connector after failure without losing WAL.

`publication.autocreate.mode`

filtered automatically creates a publication containing only whitelisted tables. If you manage publications manually, set it to disabled.

`snapshot.mode`

initial – take a blocking consistent snapshot before streaming
initial_only – snapshot, then stop (no streaming)
exported – leverages pg_export_snapshot() for minimal locking
never – start streaming from current LSN, useful when log retention guarantees consistency

`plugin.name`

Must match the logical decoding plugin loaded by PostgreSQL. Common values: pgoutput, wal2json.

Exactly-Once Delivery

Set the following to achieve exactly-once semantics:

"producer.override.enable.idempotence": "true", "producer.override.acks": "all", "producer.override.transaction.timeout.ms": 900000, "producer.override.max.in.flight.requests.per.connection": 1

Monitoring & Troubleshooting

Connector Metrics

source-record-poll-total – WAL events per poll
snapshot-running, streaming-running
number-of-events-sent – emitted Kafka records

PostgreSQL Replication Lag

SELECT slot_name, pg_size_pretty(pg_wal_lsn_diff(pg_current_wal_lsn(), restart_lsn)) AS retained FROM pg_replication_slots;

Large retained WAL indicates slow connector or downstream backlog. Increase Kafka throughput or add partitions.

Best Practices

Pin connector & PostgreSQL versions to avoid unexpected breakage on upgrade.
Use Avro + Schema Registry for evolution-safe payloads instead of raw JSON.
Archive WAL for ≥ connector down-time SLA or Debezium can lose position.
Run connectors in distributed mode for fault tolerance and rolling upgrades.
Encrypt traffic with TLS both between PostgreSQL ↔ Kafka Connect and Connect ↔ Kafka.

Handling Schema Changes

Debezium emits schema fields with every change event and separate schema-changes topics. Consumers can adapt in near real time. For incompatible changes (e.g., column drop) adopt a double-write pattern: create a new table, migrate traffic, then exclude the old table in connector config.

Secure Your CDC Pipeline

Create a dedicated database user with REPLICATION but no DDL privileges.
Apply sslmode=require in connector config if PostgreSQL is TLS-enabled.
Configure ACLs in Kafka to restrict who can read raw change topics.

Practical Use-Case Example

Assume an e-commerce platform wants to keep its product catalog in Elasticsearch. Instead of periodic SQL dumps, Debezium streams row updates directly. A Kafka Connect sink connector indexed to Elasticsearch ensures search results are updated within milliseconds of the database commit—no CRON jobs, no full reloads.

Galaxy & Debezium

While Debezium operates at the replication layer, you can still write SQL in an IDE like Galaxy to verify replication slot statistics, confirm triggerless CDC and inspect connector heart-beat tables. Galaxy’s autocomplete and sharing features help teams maintain these diagnostic queries centrally.

Conclusion

Configuring Debezium for PostgreSQL CDC is primarily about enabling logical replication, granting minimal credentials, and carefully setting connector properties. Once in place, you gain a robust event stream that can power analytics, microservices and search—all without touching application code.

Why Configuring Debezium for PostgreSQL Change Data Capture (CDC) is important

Building modern data platforms often requires near real-time replication of database changes into analytical stores, caches, or microservices. Debezium provides a low-latency, reliable way to extract PostgreSQL changes without intrusive triggers or full-table scans, enabling architectures like event sourcing and data meshes.

Configuring Debezium for PostgreSQL Change Data Capture (CDC) Example Usage

Common Mistakes

Using the default <code>wal_level=replica</code> instead of <code>logical</code>. This prevents logical decoding entirely. Fix by editing <code>postgresql.conf</code>, setting <code>wal_level = logical</code> and restarting the database.
Forgetting to increase <code>max_replication_slots</code> and <code>max_wal_senders</code>. Without free slots Debezium fails to start or PostgreSQL refuses new connections. Always allocate at least one extra slot beyond current connectors for maintenance.
Leaving <code>snapshot.mode</code> at default <code>initial</code> on very large tables during business hours. The blocking snapshot can lock writes. Use <code>exported</code> or pre-warm by running <code>initial_only</code> during off-peak hours, then switch to streaming.

Frequently Asked Questions (FAQs)

Does Debezium require downtime to enable CDC on PostgreSQL?

No. Changing wal_level from replica to logical requires a restart, but this can be performed during a regular maintenance window. Afterward, Debezium can be added without further downtime.

Can Debezium capture schema changes like `ALTER TABLE`?

Yes. Debezium emits DDL events to a special schema-changes topic, enabling consumers to adapt automatically or trigger migrations.

How do I recover if the Debezium connector is down for several hours?

As long as WAL files covering the downtime are retained and the replication slot remains, restarting the connector will fast-forward through missed LSNs. Ensure wal_keep_size or archiving covers your maximum outage window.

Can I use Galaxy to monitor Debezium’s replication state?

Yes. You can run the replication-slot diagnostics queries shown above inside Galaxy’s SQL editor, save them in a shared Collection, and let teammates rerun or endorse them for operational checks.