How to Schedule Jobs in ClickHouse

Galaxy Glossary

How do I automate recurring ClickHouse tasks with cron or Airflow?

Scheduling jobs in ClickHouse means automating recurring tasks—such as OPTIMIZE, DELETE, or reporting queries—using cron or an external orchestrator.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

Description

Table of Contents

How do I schedule a ClickHouse query with cron?

Create a shell script or inline cron entry that calls clickhouse-client with the SQL you want to run. Cron handles the timing; ClickHouse executes the query.

What is the basic cron format?

* * * * * user command represents minute, hour, day-of-month, month, and day-of-week. Replace command with a clickhouse-client call.

Why use a wrapper script instead of inline SQL?

Wrapper scripts let you version-control SQL, add logging, and reuse connection flags. They also keep long queries readable.

When should I run OPTIMIZE TABLE jobs?

Schedule OPTIMIZE TABLE Orders FINAL during off-peak hours—typically early morning—so merges do not impact user queries.

How can I automate data purging with TTL?

Add a TTL order_date + INTERVAL 90 DAY DELETE clause when you create a table. ClickHouse removes rows automatically, eliminating the need for manual DELETE jobs.

Which orchestrators integrate smoothly?

Airflow, Dagster, or Prefect trigger ClickHouse jobs via Python operators. Use connection pools and retries to handle transient failures.

What best practices keep jobs safe?

Test queries on staging data, use --readonly=2 in dry-runs, write idempotent SQL, and log both STDOUT and STDERR for auditability.

Why How to Schedule Jobs in ClickHouse is important

How to Schedule Jobs in ClickHouse Example Usage


-- Weekly sales snapshot inserted by cron
INSERT INTO WeeklySales
SELECT toStartOfWeek(order_date)  AS week,
       SUM(total_amount)          AS revenue,
       COUNT()                    AS orders
FROM Orders
GROUP BY week;

How to Schedule Jobs in ClickHouse Syntax


# Cron entry to optimize the Orders table daily at 03:00 UTC
0 3 * * * clickhouse-client \
  --host=clickhouse.prod.local \
  --user=reporter \
  --password=$CLICKHOUSE_PW \
  --query="OPTIMIZE TABLE Orders FINAL;"

# Shell wrapper (optimize_orders.sh)
#!/bin/bash
clickhouse-client \
  --host=clickhouse.prod.local \
  --user=reporter \
  --password=$CLICKHOUSE_PW \
  --query="OPTIMIZE TABLE Orders FINAL;"

# Add wrapper to cron, running every Sunday at 01:00
0 1 * * 0 /usr/local/bin/optimize_orders.sh >> /var/log/clickhouse/optimize.log 2>&1

Common Mistakes

Frequently Asked Questions (FAQs)

Can I use ClickHouse's internal scheduler?

No. ClickHouse lacks a native job scheduler; use cron or an orchestrator.

How do I pass multiple statements?

Use --multiquery or separate statements with semicolons inside the quoted --query string.

What if the job fails?

Redirect STDERR, enable cron email alerts, and set retries in Airflow to catch and rerun failed jobs.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.