How to Ingest Google Analytics 4 Export Data into BigQuery

Galaxy Glossary

How do I ingest Google Analytics 4 exports into BigQuery?

Ingesting Google Analytics 4 (GA4) exports into BigQuery involves configuring GA4’s native export or a custom pipeline so that raw event data lands in BigQuery tables, where it can be queried with SQL.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Description

Overview

Google Analytics 4 (GA4) offers a native export that streams unsampled event-level data straight into BigQuery. Once the data arrives, analysts can join it with product, marketing, or revenue datasets and power machine-learning pipelines. However, the export is not entirely turnkey: you still have to decide on region, retention, partitioning strategy, costs, and how to handle schema evolution. This glossary entry walks through the entire process—native connector and DIY methods—so that your team reliably ingests GA4 data and can trust it downstream.

Why GA4 → BigQuery Matters

Unlike the aggregated reports available in the GA4 UI, the BigQuery export exposes every user interaction as a row of structured JSON. This granularity unlocks:

  • Attribution modeling that combines web/app events with CRM data.
  • Customer-360 views where behavioral metrics sit next to subscription revenue.
  • Near real-time dashboards for marketing teams.
  • Training datasets for churn or LTV prediction.

Without the export, you are limited to sampled, high-level metrics through the GA4 Reporting API.

Prerequisites

Access

You need Editor or Administrator access in GA4 and BigQuery Admin (or a combination of roles that allow dataset creation and job execution) in Google Cloud.

Billing

Both your GA4 property and the destination BigQuery project must be linked to the same Google Cloud billing account.

Region Alignment

If your analytics data must reside in the EU or another jurisdiction, create the BigQuery project in that region before linking.

Method 1: Native GA4 BigQuery Export

Step-by-Step

  1. Open GA4 ▶ AdminProduct LinksBigQuery Links.
  2. Click Link and select the Google Cloud project that hosts—or will host—the dataset.
  3. Choose the daily export, streaming export (intraday), or both.
    • Daily: one batch file per calendar day.
    • Streaming: near real-time micro-batches, useful for dashboards.
  4. Select the data location (US, EU, etc.) to match your BigQuery project.
  5. Review costs. As of today, the export itself is free but you pay for BigQuery storage and queries.
  6. Confirm. GA4 creates a dataset named analytics_123456789 and starts sending data within 24 hours.

Resulting Schema

You will see tables such as events_YYYYMMDD for daily data and events_intraday_YYYYMMDD for streaming. Each row contains:

  • event_date, event_timestamp
  • event_name
  • user_pseudo_id
  • event_params (REPEATED RECORD)
  • user_properties (REPEATED RECORD)
  • …plus device and geo fields

The schema evolves as new custom dimensions or Google-defined parameters appear. BigQuery automatically adds columns, but downstream SQL must guard against NULLs.

Partitioning & Clustering

GA4 creates ingestion-time partitioned tables by default. For heavy workloads, add clustering on event_name or user_pseudo_id to accelerate selective queries.

Automation & Monitoring

  • Dataform/DBT: Schedule transformation jobs that materialize analytics-ready models.
  • Cloud Monitoring: Alert on “table not updated” scenarios by checking for the presence of yesterday’s partition.
  • Cost Controls: Use CREATE MATERIALIZED VIEW for common aggregates to reduce query costs.

Method 2: DIY Pipeline (when Native Export Is Not Enough)

Cases where a custom pipeline makes sense:

  • You need historical backfill prior to enabling the native export.
  • You require transformations before data hits BigQuery (e.g., PII redaction).
  • You mix GA4 data with other sources in a unified ingestion path (Kafka, Pub/Sub).

Architecture

  1. Pull data via the GA4 Reporting API or Measurement Protocol.
  2. Land raw JSON into Cloud Storage.
  3. Load or stream into BigQuery using bq load or WRITE_APPEND.
  4. Maintain schema files in a Git-controlled repository.

Sample Ingestion Code (Python)

from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.cloud import bigquery
from google.protobuf.json_format import MessageToDict
import json, datetime

PROPERTY_ID = "123456789"
BQ_TABLE = "myproject.analytics.events_raw"

ga_client = BetaAnalyticsDataClient()
bq_client = bigquery.Client()

request = {
"property": f"properties/{PROPERTY_ID}",
"date_ranges": [{"start_date": "yesterday", "end_date": "yesterday"}],
"dimensions": ["eventName", "date"],
"metrics": ["eventCount"],
}
response = ga_client.run_report(request)
rows = [MessageToDict(r) for r in response.rows]

errors = bq_client.insert_rows_json(BQ_TABLE, rows)
if errors:
raise RuntimeError(errors)

This script can be wrapped in Cloud Functions or Cloud Run and scheduled via Cloud Scheduler.

Handling Schema Evolution

Use BigQuery’s schemaUpdateOptions=["ALLOW_FIELD_ADDITION"] flag in load jobs, and adopt a late-binding view pattern so that reporting queries do not break when new columns appear.

Best Practices

  • Use Versioned Datasets: Keep raw, intermediate, and prod layers to isolate changes.
  • Document Custom Dimensions: Store meta-information in a dedicated BigQuery table so that BI tools can auto-discover field meanings.
  • Backfill Strategy: For the native export, contact Google support for one-time historical backfill if you just enabled GA4. Otherwise, script the Reporting API.
  • Partition Pruning: Always filter on _TABLE_SUFFIX when running day-level queries. This avoids scanning all partitions.
  • Security: Mask IP addresses or URLs containing PII before sharing datasets broadly.

Practical Example

-- Aggregate weekly active users (WAU) for last 12 weeks
WITH last_12_weeks AS (
SELECT
PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) AS event_date,
user_pseudo_id
FROM `myproject.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 84 DAY))
AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY event_date, user_pseudo_id
)
SELECT
FORMAT_DATE('%G-W%V', event_date) AS iso_week,
COUNT(DISTINCT user_pseudo_id) AS wau
FROM last_12_weeks
GROUP BY iso_week
ORDER BY iso_week;

You can paste this query into any SQL editor. If you use the Galaxy desktop client, its AI copilot can automatically convert the date math to your company’s fiscal calendar or suggest clustering columns after it inspects the dataset.

Common Mistakes & How to Fix Them

1. Ignoring Intraday Tables

Why it’s wrong: Dashboards built only on daily tables show a 24-hour delay.
Fix: UNION the events_intraday_* tables or build a view that falls back when the daily table isn’t ready.

2. Hard-Coding Schema

Why it’s wrong: GA4 adds new columns silently, breaking ETL scripts that expect fixed schemas.
Fix: Use SELECT * EXCEPT or treat new fields as JSON blobs until modeled.

3. Full Table Scans

Why it’s wrong: Forgetting _TABLE_SUFFIX or partition filters leads to terabytes scanned and high bills.
Fix: Wrap common predicates in authorized views or UDFs so analysts must supply date filters.

Working Query Example

-- Identify top landing pages for paid campaigns in the last 30 days
SELECT
event_params.value.string_value AS landing_page,
COUNTIF(event_name = 'session_start') AS sessions,
COUNTIF(event_name = 'purchase') AS purchases
FROM `myproject.analytics_123456789.events_*`,
UNNEST(event_params) AS event_params
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND event_params.key = "page_location"
GROUP BY landing_page
ORDER BY purchases DESC
LIMIT 25;

Galaxy & GA4 Data Exploration

While ingestion occurs on Google Cloud, exploration often happens in a SQL editor. Galaxy’s desktop client connects directly to BigQuery. Its AI copilot can:

  • Auto-generate SQL that respects _TABLE_SUFFIX and partition pruning.
  • Suggest new clustering keys after inspecting query plans.
  • Explain GA4 nested fields (arrays) in plain English.
  • Allow teams to Endorse a WAU query so everyone re-uses the verified version.

Conclusion

GA4’s BigQuery export removes the ceiling that sampling once imposed on web analytics. Whether you rely on the native link or build a custom pipeline, adhering to partitioning best practices, schema governance, and cost monitoring ensures that your marketing and product teams gain fast, trustworthy insights. Integrating a modern SQL editor such as Galaxy further shortens the path from raw events to decision-ready dashboards.

Why How to Ingest Google Analytics 4 Export Data into BigQuery is important

GA4’s direct export into BigQuery is the only free, unsampled path to event-level data. Mastering the ingestion process unlocks advanced attribution, real-time dashboards, and ML use cases while avoiding costly query mistakes and schema-drift outages.

How to Ingest Google Analytics 4 Export Data into BigQuery Example Usage


SELECT COUNT(*) FROM `myproject.analytics_123456789.events_*` WHERE event_name = 'purchase' AND _TABLE_SUFFIX = '20240213';

Common Mistakes

Frequently Asked Questions (FAQs)

How long after linking does data appear in BigQuery?

Initial data usually lands within 24 hours. For streaming exports, the lag is often under one hour.

Does the GA4 export cost money?

The export feature itself is free, but you pay for BigQuery storage and query processing. Streaming exports incur slightly higher storage costs because they write more partitions.

How can I backfill data collected before I enabled the native export?

Use the GA4 Reporting API to retrieve historical aggregates or the Measurement Protocol combined with Cloud Storage + BigQuery to replicate raw events. Google support can sometimes run a one-time backfill—ask your account rep.

Can Galaxy help me query GA4 data faster?

Yes. Galaxy’s BigQuery connector lets you explore GA4 datasets with AI-generated SQL that automatically handles nested fields and partition filters. You can also share endorsed queries in Collections so teams avoid reinventing the wheel.

Want to learn about other SQL terms?