A step-by-step guide to configuring Debezium with PostgreSQL to stream real-time change data capture events through Kafka.
This guide walks you through installing, configuring, and running Debezium on PostgreSQL so you can stream every INSERT, UPDATE, and DELETE to downstream consumers in near real-time.
PostgreSQL is the backbone of countless transactional systems. Yet analytics, search, and microservices benefit from fresh data delivered as soon as it changes. Change Data Capture (CDC) bridges that gap. Debezium is the de-facto open-source CDC platform that leverages Postgres’s logical replication to publish row-level changes to Apache Kafka topics. From there, you can feed dashboards, alerting pipelines, search indexes, or a modern SQL editor such as Galaxy for ad-hoc analysis on a replica data mart.
postgresql.conf
: wal_level = logicalpg_hba.conf
to allow replication user access: host replication debezium 0.0.0.0/0 md5 CREATE ROLE debezium WITH LOGIN PASSWORD 'dbz';
ALTER ROLE debezium WITH REPLICATION;
GRANT ALL PRIVILEGES ON DATABASE inventory TO debezium;
Download a Kafka distribution, extract, and run:
bin/zookeeper-server-start.sh config/zookeeper.properties &
bin/kafka-server-start.sh config/server.properties &
Start Kafka Connect in standalone or distributed mode, making sure to include the Debezium Postgres connector JARs in the plugin.path
.
POST the following JSON to the Kafka Connect REST API (https://localhost:8083/connectors
):
{
"name": "inventory-connector",
"config": {
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "localhost",
"database.port": "5432",
"database.user": "debezium",
"database.password": "dbz",
"database.dbname": "inventory",
"database.server.name": "pg",
"plugin.name": "pgoutput",
"slot.name": "inventory_slot",
"publication.autocreate.mode": "filtered",
"table.include.list": "public.customers,public.orders"
}
}
Kafka Connect responds with a 201 Created
if successful. Debezium will create a publication and a replication slot automatically (unless you pre-create them).
Insert a test row in Postgres:
INSERT INTO customers(first_name,last_name,email)
VALUES ('Ada','Lovelace','ada@example.com');
Consume the Kafka topic:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \
--topic pg.public.customers --from-beginning
You should see a JSON envelope with before
/after
payloads describing the insert.
Since PostgreSQL 10, pgoutput
is the native logical decoding plug-in, offering better performance, fewer dependencies, and compatibility with logical replication slots. Use wal2json
only when you need its JSON structures or older Postgres versions.
Abandoned slots accumulate WAL and can fill disks. Monitor pg_replication_slots
and drop unused slots when disconnecting a connector.
Run Kafka + Connect in the same VPC/subnet as Postgres to minimize latency and avoid exposing the database publicly.
If you use Confluent Schema Registry or Redpanda’s schema service, configure Debezium Avro serialization and enforce BACKWARD
compatibility to keep downstream consumers happy as schemas evolve.
Multiple connectors can’t share a single replication slot. Either reuse the same connector or create a unique slot.name
for each.
This indicates wal_level
wasn’t updated or Postgres wasn’t restarted. Edit postgresql.conf
and restart the service.
If initial snapshots disabled and no live changes occur, topics remain empty. Enable snapshot.mode=initial
or perform a dummy update to trigger an event.
Expose JMX metrics from Kafka Connect to Prometheus. Key Debezium metrics include NumberOfMessagesProduced
, QueueRemainingCapacity
, and MilliSecondsBehindSource
. Log levels can be increased by editing log4j.properties
in the Connect class-path.
Once Kafka topics are populated, you can:
sslmode=require
on the connector.With Postgres configured for logical replication, Kafka + Connect up and running, and Debezium registered, you now have a robust CDC pipeline. Real-time data opens doors for reactive microservices, audit trails, and blazing-fast analytics. When paired with a modern SQL editor like Galaxy, your team can query the latest facts without waiting on nightly ETL batches.
Demand for real-time analytics, microservice data synchronization, and audit compliance has made Change Data Capture a critical capability. Debezium leverages native PostgreSQL replication to stream changes without intrusive triggers or batch ETL, providing a low-latency, scalable foundation for event-driven architectures.
Yes. Debezium is licensed under Apache 2.0 and maintained by Red Hat and a community of contributors.
No. Logical replication only requires a role with REPLICATION privilege and access to the target database—not full superuser rights.
Galaxy consumes the near-real-time data delivered by Debezium into your analytics warehouse, letting developers query the freshest state with its modern SQL editor and AI copilot.
Yes, Debezium emits schema change events, but you must enable include.schema.changes=true
and ensure consumers handle them.