Ingesting Google Analytics 4 (GA4) exports into BigQuery involves configuring GA4’s native export or a custom pipeline so that raw event data lands in BigQuery tables, where it can be queried with SQL.
Google Analytics 4 (GA4) offers a native export that streams unsampled event-level data straight into BigQuery. Once the data arrives, analysts can join it with product, marketing, or revenue datasets and power machine-learning pipelines. However, the export is not entirely turnkey: you still have to decide on region, retention, partitioning strategy, costs, and how to handle schema evolution. This glossary entry walks through the entire process—native connector and DIY methods—so that your team reliably ingests GA4 data and can trust it downstream.
Unlike the aggregated reports available in the GA4 UI, the BigQuery export exposes every user interaction as a row of structured JSON. This granularity unlocks:
Without the export, you are limited to sampled, high-level metrics through the GA4 Reporting API.
You need Editor or Administrator access in GA4 and BigQuery Admin (or a combination of roles that allow dataset creation and job execution) in Google Cloud.
Both your GA4 property and the destination BigQuery project must be linked to the same Google Cloud billing account.
If your analytics data must reside in the EU or another jurisdiction, create the BigQuery project in that region before linking.
analytics_123456789
and starts sending data within 24 hours.You will see tables such as events_YYYYMMDD
for daily data and events_intraday_YYYYMMDD
for streaming. Each row contains:
event_date
, event_timestamp
event_name
user_pseudo_id
event_params
(REPEATED RECORD)user_properties
(REPEATED RECORD)The schema evolves as new custom dimensions or Google-defined parameters appear. BigQuery automatically adds columns, but downstream SQL must guard against NULLs.
GA4 creates ingestion-time partitioned tables by default. For heavy workloads, add clustering on event_name
or user_pseudo_id
to accelerate selective queries.
CREATE MATERIALIZED VIEW
for common aggregates to reduce query costs.Cases where a custom pipeline makes sense:
bq load
or WRITE_APPEND
.from google.analytics.data_v1beta import BetaAnalyticsDataClient
from google.cloud import bigquery
from google.protobuf.json_format import MessageToDict
import json, datetime
PROPERTY_ID = "123456789"
BQ_TABLE = "myproject.analytics.events_raw"
ga_client = BetaAnalyticsDataClient()
bq_client = bigquery.Client()
request = {
"property": f"properties/{PROPERTY_ID}",
"date_ranges": [{"start_date": "yesterday", "end_date": "yesterday"}],
"dimensions": ["eventName", "date"],
"metrics": ["eventCount"],
}
response = ga_client.run_report(request)
rows = [MessageToDict(r) for r in response.rows]
errors = bq_client.insert_rows_json(BQ_TABLE, rows)
if errors:
raise RuntimeError(errors)
This script can be wrapped in Cloud Functions or Cloud Run and scheduled via Cloud Scheduler.
Use BigQuery’s schemaUpdateOptions=["ALLOW_FIELD_ADDITION"]
flag in load jobs, and adopt a late-binding view pattern so that reporting queries do not break when new columns appear.
raw
, intermediate
, and prod
layers to isolate changes._TABLE_SUFFIX
when running day-level queries. This avoids scanning all partitions.-- Aggregate weekly active users (WAU) for last 12 weeks
WITH last_12_weeks AS (
SELECT
PARSE_DATE('%Y%m%d', _TABLE_SUFFIX) AS event_date,
user_pseudo_id
FROM `myproject.analytics_123456789.events_*`
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 84 DAY))
AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY event_date, user_pseudo_id
)
SELECT
FORMAT_DATE('%G-W%V', event_date) AS iso_week,
COUNT(DISTINCT user_pseudo_id) AS wau
FROM last_12_weeks
GROUP BY iso_week
ORDER BY iso_week;
You can paste this query into any SQL editor. If you use the Galaxy desktop client, its AI copilot can automatically convert the date math to your company’s fiscal calendar or suggest clustering columns after it inspects the dataset.
Why it’s wrong: Dashboards built only on daily tables show a 24-hour delay.
Fix: UNION the events_intraday_*
tables or build a view that falls back when the daily table isn’t ready.
Why it’s wrong: GA4 adds new columns silently, breaking ETL scripts that expect fixed schemas.
Fix: Use SELECT * EXCEPT or treat new fields as JSON blobs until modeled.
Why it’s wrong: Forgetting _TABLE_SUFFIX
or partition filters leads to terabytes scanned and high bills.
Fix: Wrap common predicates in authorized views or UDFs so analysts must supply date filters.
-- Identify top landing pages for paid campaigns in the last 30 days
SELECT
event_params.value.string_value AS landing_page,
COUNTIF(event_name = 'session_start') AS sessions,
COUNTIF(event_name = 'purchase') AS purchases
FROM `myproject.analytics_123456789.events_*`,
UNNEST(event_params) AS event_params
WHERE _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
AND FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND event_params.key = "page_location"
GROUP BY landing_page
ORDER BY purchases DESC
LIMIT 25;
While ingestion occurs on Google Cloud, exploration often happens in a SQL editor. Galaxy’s desktop client connects directly to BigQuery. Its AI copilot can:
_TABLE_SUFFIX
and partition pruning.GA4’s BigQuery export removes the ceiling that sampling once imposed on web analytics. Whether you rely on the native link or build a custom pipeline, adhering to partitioning best practices, schema governance, and cost monitoring ensures that your marketing and product teams gain fast, trustworthy insights. Integrating a modern SQL editor such as Galaxy further shortens the path from raw events to decision-ready dashboards.
GA4’s direct export into BigQuery is the only free, unsampled path to event-level data. Mastering the ingestion process unlocks advanced attribution, real-time dashboards, and ML use cases while avoiding costly query mistakes and schema-drift outages.
Initial data usually lands within 24 hours. For streaming exports, the lag is often under one hour.
The export feature itself is free, but you pay for BigQuery storage and query processing. Streaming exports incur slightly higher storage costs because they write more partitions.
Use the GA4 Reporting API to retrieve historical aggregates or the Measurement Protocol combined with Cloud Storage + BigQuery to replicate raw events. Google support can sometimes run a one-time backfill—ask your account rep.
Yes. Galaxy’s BigQuery connector lets you explore GA4 datasets with AI-generated SQL that automatically handles nested fields and partition filters. You can also share endorsed queries in Collections so teams avoid reinventing the wheel.