Debezium Postgres Setup Tutorial: Change Data Capture Made Easy

How do I set up Debezium on PostgreSQL for real-time change data capture?

A step-by-step guide to configuring Debezium with PostgreSQL to stream real-time change data capture events through Kafka.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Debezium Postgres Setup Tutorial

This guide walks you through installing, configuring, and running Debezium on PostgreSQL so you can stream every INSERT, UPDATE, and DELETE to downstream consumers in near real-time.

Why Debezium + Postgres?

PostgreSQL is the backbone of countless transactional systems. Yet analytics, search, and microservices benefit from fresh data delivered as soon as it changes. Change Data Capture (CDC) bridges that gap. Debezium is the de-facto open-source CDC platform that leverages Postgres’s logical replication to publish row-level changes to Apache Kafka topics. From there, you can feed dashboards, alerting pipelines, search indexes, or a modern SQL editor such as Galaxy for ad-hoc analysis on a replica data mart.

Prerequisites

Software Versions

PostgreSQL 10+ (logical replication introduced in 10)
Apache Kafka 2.8+ and Kafka Connect
Debezium Connector plug-in matching your Kafka Connect version
Java 11+

System Resources

At least 4 GB RAM and 2 vCPUs for a local test cluster
Network ports: 5432 (Postgres), 9092 (Kafka), 8083 (Kafka Connect REST)

Step-by-Step Setup

1. Enable Logical Replication in PostgreSQL

Modify postgresql.conf: wal_level = logical
max_replication_slots = 10
max_wal_senders = 10
Update pg_hba.conf to allow replication user access: host replication debezium 0.0.0.0/0 md5
Restart PostgreSQL for changes to take effect.

2. Create a Replication User

CREATE ROLE debezium WITH LOGIN PASSWORD 'dbz'; ALTER ROLE debezium WITH REPLICATION; GRANT ALL PRIVILEGES ON DATABASE inventory TO debezium;

3. Install and Start Kafka + Kafka Connect

Download a Kafka distribution, extract, and run:

bin/zookeeper-server-start.sh config/zookeeper.properties & bin/kafka-server-start.sh config/server.properties &

Start Kafka Connect in standalone or distributed mode, making sure to include the Debezium Postgres connector JARs in the plugin.path.

4. Register the Debezium Connector

POST the following JSON to the Kafka Connect REST API (https://localhost:8083/connectors):

{ "name": "inventory-connector", "config": { "connector.class": "io.debezium.connector.postgresql.PostgresConnector", "database.hostname": "localhost", "database.port": "5432", "database.user": "debezium", "database.password": "dbz", "database.dbname": "inventory", "database.server.name": "pg", "plugin.name": "pgoutput", "slot.name": "inventory_slot", "publication.autocreate.mode": "filtered", "table.include.list": "public.customers,public.orders" } }

Kafka Connect responds with a 201 Created if successful. Debezium will create a publication and a replication slot automatically (unless you pre-create them).

5. Verify the Stream

Insert a test row in Postgres:

INSERT INTO customers(first_name,last_name,email) VALUES ('Ada','Lovelace','ada@example.com');

Consume the Kafka topic:

bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 \ --topic pg.public.customers --from-beginning

You should see a JSON envelope with before/after payloads describing the insert.

Best Practices

Use pgoutput Unless You Need wal2json

Since PostgreSQL 10, pgoutput is the native logical decoding plug-in, offering better performance, fewer dependencies, and compatibility with logical replication slots. Use wal2json only when you need its JSON structures or older Postgres versions.

Keep Slots and Publications Managed

Abandoned slots accumulate WAL and can fill disks. Monitor pg_replication_slots and drop unused slots when disconnecting a connector.

Separate Infrastructure Networks

Run Kafka + Connect in the same VPC/subnet as Postgres to minimize latency and avoid exposing the database publicly.

Schema Registry & Compatibility Modes

If you use Confluent Schema Registry or Redpanda’s schema service, configure Debezium Avro serialization and enforce BACKWARD compatibility to keep downstream consumers happy as schemas evolve.

Common Mistakes & How to Fix Them

Connector Fails with "FATAL: replication slot ... is active"

Multiple connectors can’t share a single replication slot. Either reuse the same connector or create a unique slot.name for each.

"WAL Level Is Not Logical" After Deploy

This indicates wal_level wasn’t updated or Postgres wasn’t restarted. Edit postgresql.conf and restart the service.

Kafka Topics Stay Empty

If initial snapshots disabled and no live changes occur, topics remain empty. Enable snapshot.mode=initial or perform a dummy update to trigger an event.

Monitoring & Troubleshooting

Expose JMX metrics from Kafka Connect to Prometheus. Key Debezium metrics include NumberOfMessagesProduced, QueueRemainingCapacity, and MilliSecondsBehindSource. Log levels can be increased by editing log4j.properties in the Connect class-path.

Consuming the CDC Stream

Once Kafka topics are populated, you can:

Sink into Redshift, BigQuery, or Snowflake via Kafka Connect sink connectors for analytics.
Push to Elasticsearch for full-text search.
Feed Galaxy via a materialized Postgres replica or data warehouse so engineers can explore the freshest data using Galaxy’s lightning-fast SQL editor and AI copilot.

Security Considerations

Store connector credentials in secrets managers, not plain JSON.
Enable TLS encryption for Kafka and Postgres connections; configure sslmode=require on the connector.
Use network ACLs to restrict Kafka Connect traffic to the database.

Conclusion

With Postgres configured for logical replication, Kafka + Connect up and running, and Debezium registered, you now have a robust CDC pipeline. Real-time data opens doors for reactive microservices, audit trails, and blazing-fast analytics. When paired with a modern SQL editor like Galaxy, your team can query the latest facts without waiting on nightly ETL batches.

Why Debezium Postgres Setup Tutorial: Change Data Capture Made Easy is important

Demand for real-time analytics, microservice data synchronization, and audit compliance has made Change Data Capture a critical capability. Debezium leverages native PostgreSQL replication to stream changes without intrusive triggers or batch ETL, providing a low-latency, scalable foundation for event-driven architectures.

Debezium Postgres Setup Tutorial: Change Data Capture Made Easy Example Usage


curl -X POST -H "Content-Type: application/json" \
  --data @inventory-connector.json \
  http://localhost:8083/connectors

Debezium Postgres Setup Tutorial: Change Data Capture Made Easy Syntax

Common Mistakes

Using the same replication slot for multiple connectors, leading to "slot already active" errors. Create a unique slot for every Debezium connector or reuse the same connector instance.
Forgetting to restart Postgres after setting `wal_level = logical`, causing Debezium to fail during snapshot. Always restart the database to apply WAL configuration changes.
Leaving abandoned replication slots which bloat the WAL and consume disk space. Monitor `pg_replication_slots` and drop unused slots promptly.

Frequently Asked Questions (FAQs)

Is Debezium open source?

Yes. Debezium is licensed under Apache 2.0 and maintained by Red Hat and a community of contributors.

Do I need superuser privileges on PostgreSQL?

No. Logical replication only requires a role with REPLICATION privilege and access to the target database—not full superuser rights.

How does Galaxy relate to Debezium?

Galaxy consumes the near-real-time data delivered by Debezium into your analytics warehouse, letting developers query the freshest state with its modern SQL editor and AI copilot.

Can Debezium capture DDL changes?

Yes, Debezium emits schema change events, but you must enable include.schema.changes=true and ensure consumers handle them.