How to Implement CI/CD with BigQuery

Galaxy Glossary

How do I set up CI/CD for BigQuery SQL?

CI/CD with BigQuery automates testing and deployment of SQL, datasets, and views through version-controlled pipelines.

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)

Oops! Something went wrong while submitting the form.

Description

Example H2

Example H3

Why use CI/CD for BigQuery projects?

Automating testing and deployment prevents ad-hoc scripts, enforces review, and lets engineers roll back safely. Pipelines validate SQL, run unit tests against temporary datasets, and promote artifacts to production when checks pass.

Which tools integrate best with BigQuery CI/CD?

GitHub Actions, GitLab CI, and Cloud Build provide managed runners that can call the bq CLI and Terraform.Combine them with dbt Core for model tests or with sqlfluff for linting.

How do I structure my repository?

Keep SQL files in /sql, dbt models in /models, and Terraform in /infra. Use one folder per logical area (e.g., orders, customers).Store test data as CSV in /tests/fixtures for deterministic unit tests.

What does a minimal GitHub Actions workflow look like?

Use a .github/workflows/bigquery.yml file that installs the Cloud SDK, authenticates with a service account, lints SQL, executes unit tests on a sandbox dataset, and deploys to prod on the main branch.

Step 1 – Install Cloud SDK

The google-github-actions/setup-gcloud action installs the SDK and bq CLI so subsequent steps can run BigQuery commands.

Step 2 – Authenticate non-interactively

Store a JSON key as a GitHub secret.Export GOOGLE_APPLICATION_CREDENTIALS to allow Terraform, dbt, and raw bq commands to run.

Step 3 – Lint and unit-test SQL

Run sqlfluff lint sql/ --dialect bigquery and dbt's dbt test against a temporary dataset ci_$(date +%s).Fail the job on any error.

Step 4 – Deploy to production

On push to main or a tagged release, run terraform apply --auto-approve (or dbt run) to create views, tables, or authorized routines in the prod dataset.

How do I test queries against ecommerce tables?

Create staging datasets such as ci_customers, insert fixture rows for Customers, Orders, Products, and OrderItems, then run assertions verifying record counts or totals.

Best practices for reliable pipelines

Assign least-privilege service accounts, cache the Cloud SDK layer, clean up temp datasets at job end, and publish coverage metrics to keep pipelines fast and secure.

What if deployment fails?

Use Terraform state rollbacks or dbt full-refresh to revert schema changes.Snapshot production datasets nightly so you can restore data quickly.

Why How to Implement CI/CD with BigQuery is important

How to Implement CI/CD with BigQuery Example Usage


-- Example: verify customer lifetime value after deployment
SELECT c.name,
       SUM(oi.quantity * p.price) AS lifetime_value
FROM   `analytics.Customers`   AS c
JOIN   `analytics.Orders`      AS o  ON o.customer_id = c.id
JOIN   `analytics.OrderItems`  AS oi ON oi.order_id    = o.id
JOIN   `analytics.Products`    AS p  ON p.id           = oi.product_id
WHERE  c.created_at >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 YEAR)
GROUP  BY c.name
ORDER  BY lifetime_value DESC
LIMIT  20;

How to Implement CI/CD with BigQuery Syntax


# GitHub Actions Workflow (bigquery.yml)
name: bigquery-ci
on:
  pull_request:
  push:
    branches: [main]

jobs:
  test-and-deploy:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v4

      - name: Set up gcloud
        uses: google-github-actions/setup-gcloud@v1
        with:
          version: '>= 456.0.0'
          service_account_key: $
          project_id: my-gcp-project

      - name: Lint SQL
        run: |
          pip install sqlfluff[bigquery]
          sqlfluff lint sql/ --dialect bigquery

      - name: Unit tests with dbt
        run: |
          pip install dbt-bigquery
          export BIGQUERY_DATASET=ci_$(date +%s)
          dbt seed --target ci
          dbt run  --target ci
          dbt test --target ci

      - name: Deploy to prod (main branch only)
        if: github.ref == 'refs/heads/main'
        run: |
          terraform -chdir=infra init
          terraform -chdir=infra apply -auto-approve

Common Mistakes

Relying on production datasets for tests leads to data pollution. Always create throwaway CI datasets and delete them post-run.
Storing service account keys in plain text exposes credentials. Encrypt keys in CI secrets and grant least-privilege roles only.