Auditing BigQuery Access Logs

Galaxy Glossary

How do I audit BigQuery access logs to see who accessed which data and when?

Auditing BigQuery access logs means systematically collecting, querying, and interpreting Cloud Logging entries that record every read, write, and metadata operation on BigQuery resources so you can answer who did what, when, from where, and with which data.

Sign up for the latest in SQL knowledge from the Galaxy Team!
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

Description

Table of Contents

Overview

Google Cloud’s BigQuery automatically writes Admin Activity and Data Access logs to Cloud Logging. Learning to audit these logs is critical for security, compliance, cost control, and troubleshooting. This article walks you through how BigQuery access logging works, how to design effective audit queries, and how to operationalize the results.

Why Audit BigQuery Access Logs?

  • Security & Compliance – Detect unauthorized access, meet regulatory requirements like SOC 2 or HIPAA.
  • Data Governance – Understand data lineage and prove who queried sensitive datasets.
  • Cost Optimization – Attribute expensive queries to users and teams.
  • Operational Troubleshooting – Trace deletions, schema updates, and failed jobs.

Logging Primer

Log Types

BigQuery writes two relevant log types:

  1. Admin Activity – Always on. Captures table creation, deletion, load jobs, etc.
  2. Data Access – Disabled by default (except for metadata reads). Records SELECT and DML statements. Must be explicitly enabled at the project, folder, or organization level for DATA_READ and DATA_WRITE.

Where Logs Live

Entries are stored in projects/<PROJECT_ID>/logs/cloudaudit.googleapis.com%2Fdata_access (and similar) within Cloud Logging. They can be:

  • Queried directly with Cloud Logging’s Log Explorer
  • Exported to BigQuery, Cloud Storage, or Pub/Sub via Log Sinks
  • Monitored in real time with alerts or SIEM integrations

Core Audit Fields

Every log entry is a JSON payload. Key fields for auditing include:

  • protoPayload.authenticationInfo.principalEmail – The user/service account.
  • protoPayload.methodName – E.g., jobservice.jobcompleted.
  • protoPayload.serviceData.jobCompletedEvent.job.jobStatistics.totalProcessedBytes
  • resource.labels.project_id, dataset_id, table_id
  • severity, timestamp, status.code

Building an Audit Pipeline

1. Enable Data Access Logs

In the Google Cloud console, navigate to Logging > Logs Router > Log configuration. Ensure BigQuery Data Read and Data Write are turned on. You can also use the gcloud CLI:

gcloud logging sinks create bq_audit_sink \
bigquery.googleapis.com/projects/my-project/datasets/logs_dataset \
--log-filter='resource.type="bigquery_resource"'

2. Route Logs to BigQuery

Most teams export logs to a dedicated dataset (e.g., audit_logs) and partition on _PARTITIONTIME for cost-efficient querying.

3. Structure with Views

Create views like vw_user_query_activity to normalize fields and make stakeholder-friendly dashboards.

4. Automate Alerts

Set up Log-based alerts for high-risk events such as UPDATE table on sensitive data or excessive cost (>1 TB processed).

Practical Query Patterns

Find Out Who Queried a Table

SELECT
TIMESTAMP_TRUNC(timestamp, DAY) AS event_day,
protoPayload.authenticationInfo.principalEmail AS user,
JSON_VALUE(protoPayload.metadataJson, '$.jobChange.job.jobStatistics.totalProcessedBytes')/1e9 AS gb_processed
FROM `my-project.audit_logs.cloudaudit_googleapis_com_data_access`
WHERE
resource.type = 'bigquery_resource'
AND protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.tableDefinitions IS NULL
AND protoPayload.resourceName LIKE '%projects/my-project/datasets/my_dataset/tables/my_table%'
AND _PARTITIONTIME BETWEEN '2023-01-01' AND '2023-01-31';

Detect Large, Costly Queries

SELECT
timestamp,
protoPayload.authenticationInfo.principalEmail AS user,
job_stats.totalProcessedBytes/1e12 AS tb_scanned
FROM `my-project.audit_logs.cloudaudit_googleapis_com_data_access`,
UNNEST([protoPayload.serviceData.jobCompletedEvent.job.jobStatistics]) AS job_stats
WHERE tb_scanned > 1 -- >1 TB
ORDER BY tb_scanned DESC;

Track Deleted Tables

SELECT
timestamp,
protoPayload.authenticationInfo.principalEmail AS actor,
protoPayload.resourceName AS table_path
FROM `my-project.audit_logs.cloudaudit_googleapis_com_activity`
WHERE protoPayload.methodName = 'tableservice.delete'

Operationalizing the Results

  1. Dashboards – Use Looker Studio or Grafana BigQuery connector for visualizations.
  2. Data Catalog – Push curated metrics (last access, top users) back to a metadata table.
  3. Automation – Auto-isolate service accounts that exceed cost thresholds with Cloud Functions.

Best Practices

  • Least Privilege – Restrict who can disable log sinks.
  • Partition & Cluster – Partition on time, cluster on principalEmail and methodName.
  • Retention Policy – Set a Table Expiration to match regulatory requirements.
  • Document Views – Provide data stewards with simple, documented SQL views.

Galaxy Connection

While Galaxy is primarily a SQL editor, its AI-assisted autocomplete and parameterization make writing the complex audit queries above dramatically faster. Store your standard audit queries in a Galaxy Collection so security and data teams endorse and reuse them without copy-pasting into Slack.

Common Mistakes & Fixes

1. Forgetting to Enable Data Access Logs

Metadata-only logging is on by default, but query text and bytes processed are missing until you enable DATA_READ/WRITE. Fix: turn them on at the desired resource hierarchy level.

2. Not Exporting Logs Before the 30-Day Window

Cloud Logging only guarantees 30 days retention for no-cost logs. Create a sink to BigQuery or GCS, or you’ll lose history.

3. Querying Raw JSON Each Time

Parsing protoPayload on the fly is slow and expensive. Create schema-on-read views or flatten fields into materialized tables.

Next Steps

  • Enable data access logging today in a sandbox project.
  • Build the example views above.
  • Integrate alerts with Cloud Monitoring or your SIEM.

Why Auditing BigQuery Access Logs is important

BigQuery can store terabytes of sensitive data. Without robust auditing you can’t prove compliance, attribute costs, or detect malicious behavior. Capturing and analyzing access logs provides an immutable trail showing what data was read or modified, by whom, and at what cost—critical for SOC 2, HIPAA, GDPR, and internal governance policies.

Auditing BigQuery Access Logs Example Usage



Auditing BigQuery Access Logs Syntax



Common Mistakes

Frequently Asked Questions (FAQs)

How long does BigQuery keep access logs by default?

Cloud Logging stores log entries for 30 days at no charge. After that, entries are pruned unless you route them to BigQuery, Cloud Storage, or another destination.

Can I see the exact SQL text in the logs?

Yes—when Data Access logging is enabled, completed query jobs include the full SQL text in protoPayload.serviceData.jobCompletedEvent.job.jobConfiguration.query.query. Remember to control access to log data itself because it may contain sensitive literals.

How can I reduce the cost of querying huge log tables?

Partition on ingestion time and cluster by high-cardinality fields such as principalEmail. You can also materialize daily or hourly summary tables and grant analysts access only to those derived tables.

Does Galaxy help with BigQuery audit logging?

Galaxy’s context-aware SQL editor speeds up writing and sharing complex audit queries. You can save reusable log-analysis snippets in Galaxy Collections and let your team endorse them for future use. However, Galaxy itself does not collect the logs; it simply provides a superior query interface.

Want to learn about other SQL terms?

Trusted by top engineers on high-velocity teams
Aryeo Logo
Assort Health
Curri
Rubie Logo
Bauhealth Logo
Truvideo Logo
Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.