Integrating OpenAI’s ChatGPT with Google BigQuery so the model can run or generate SQL queries against datasets stored in BigQuery.
Connecting ChatGPT to BigQuery means giving an OpenAI language model programmatic access to Google’s serverless data warehouse so it can generate, execute, and interpret SQL on your organization’s data. Because ChatGPT itself cannot natively open network sockets or authenticate to external services, the connection is brokered through code you write—typically with Python, JavaScript, or a no-code tool—that sits between OpenAI’s API and BigQuery’s REST API.
BigQuery uses Google Cloud service accounts and OAuth scopes. The safest pattern is to create a dedicated service-account@project.iam.gserviceaccount.com
with the roles/bigquery.dataViewer
or roles/bigquery.jobUser
role restricted to the specific datasets ChatGPT should query. Generate a JSON key file and store it in a secret manager such as Google Secret Manager, AWS Secrets Manager, or HashiCorp Vault—never commit keys to Git.
You build a small server (e.g., FastAPI, Express, Cloud Functions) that:
/v1/chat/completions
endpoint with system instructions like “You are a BigQuery SQL expert” plus the user prompt and schema context.DELETE
, UPDATE
).LIMIT 100
preview) back to ChatGPT so it can craft an explanation.OpenAI and BigQuery both charge per usage, so implement caching (e.g., Redis) for repeated prompts, and set sensible limits on rows returned. For complex analysis, consider asynchronous job polling and background workers so the chat UI stays responsive.
pip install google-cloud-bigquery openai fastapi uvicorn python-dotenv
# main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
import os, openai
from google.cloud import bigquery
openai.api_key = os.getenv("OPENAI_API_KEY")
client = bigquery.Client.from_service_account_json(os.getenv("BQ_KEY_PATH"))
app = FastAPI(title="ChatGPT ↔ BigQuery Gateway")
class Prompt(BaseModel):
question: str
SYSTEM_MSG = """
You are a senior data analyst. Generate BigQuery Standard SQL only.
Disallow DELETE, UPDATE, INSERT, or DDL.
When possible, add LIMIT 100.
"""
@app.post("/ask")
async def ask_bigquery(prompt: Prompt):
try:
# 1. Get BigQuery information schema for context
tables = client.query("""
SELECT table_name, column_name, data_type
FROM `project.dataset`.INFORMATION_SCHEMA.COLUMNS
LIMIT 1000
""").result().to_dataframe()
schema_context = "\n".join(
f"{row.table_name}({row.column_name}:{row.data_type})" for _, row in tables.iterrows()
)
# 2. Ask ChatGPT for SQL
resp = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": SYSTEM_MSG},
{"role": "user", "content": prompt.question + "\nSchema:\n" + schema_context},
],
temperature=0.2,
)
sql = resp.choices[0].message.content.strip()
if any(word in sql.upper() for word in ["DELETE", "UPDATE", "INSERT", "CREATE", "DROP"]):
raise HTTPException(400, "Unsafe SQL blocked")
# 3. Run SQL in BigQuery
job = client.query(sql)
result = job.result().to_dataframe()
# 4. Ask ChatGPT to explain the results
explanation = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a helpful data analyst."},
{"role": "assistant", "content": sql},
{"role": "user", "content": f"Here are the first rows:\n{result.head().to_markdown()}\nExplain."},
],
temperature=0.3,
)
return {
"sql": sql,
"rows": result.to_dict(orient="records"),
"explanation": explanation.choices[0].message.content,
}
except Exception as e:
raise HTTPException(500, str(e))
Deploy this API to Cloud Run or another secure environment. Your front-end (or Galaxy’s SQL editor via an extension) can now call /ask
and display the enriched answer.
LIMIT
.maximumBytesBilled
on BigQuery jobs to cap spend.Galaxy is a developer-centric SQL editor with a context-aware AI copilot. While Galaxy does not proxy OpenAI requests to BigQuery on your behalf, you can:
BigQuery houses critical organizational data. Allowing ChatGPT to access it lets teams ask natural questions, automate insights, and accelerate analytics without manually writing SQL for every query. Doing so safely unlocks productivity while guarding cost, privacy, and compliance.
No. ChatGPT can only generate text. An external application must handle authentication and make REST or client library calls to BigQuery.
Create a dedicated Google Cloud service account with read-only BigQuery roles and store its key in a secure secret manager.
Galaxy’s AI copilot can draft and optimize BigQuery SQL in the editor, and its Collections feature lets teams share approved queries generated by ChatGPT, ensuring consistency and security.
Usually not. Supplying a structured schema snippet in the prompt is enough. Fine-tuning is optional and expensive.