Installs Google Cloud SDK, authenticates, and configures BigQuery so you can create datasets and run SQL from a Linux shell.
Linux servers and developer machines can run the Google Cloud SDK, letting you automate BigQuery jobs, load data, and schedule queries without a GUI.
Install curl, Python 3.8+, and unzip. Create or pick a Google Cloud project and ensure Billing and BigQuery APIs are enabled.
Download, extract, and run the install script:
curl -O https://dl.google.com/dl/cloudsdk/channels/rapid/downloads/google-cloud-cli-441.0.0-linux-x86_64.tar.gz
tar -xf google-cloud-cli-*.tar.gz
./google-cloud-sdk/install.sh
Run gcloud init --project=<PROJECT_ID>
. Follow the browser prompt to log in. Service accounts can be activated with gcloud auth activate-service-account --key-file=key.json
.
Create a US-region dataset named ecommerce
:
bq --location=US mk --dataset <PROJECT_ID>:ecommerce
Autodetect schema or supply field definitions:
bq load --autodetect --source_format=CSV \
ecommerce.Customers ./Customers.csv \
id:INTEGER,name:STRING,email:STRING,created_at:TIMESTAMP
Repeat for Orders
, Products
, and OrderItems
.
Use the bq query
command or any SQL editor like Galaxy:
bq query --use_legacy_sql=false \
"SELECT c.name, SUM(oi.quantity*p.price) AS lifetime_value
FROM `ecommerce.Customers` c
JOIN `ecommerce.Orders` o ON o.customer_id = c.id
JOIN `ecommerce.OrderItems` oi ON oi.order_id = o.id
JOIN `ecommerce.Products` p ON p.id = oi.product_id
GROUP BY c.name
ORDER BY lifetime_value DESC
LIMIT 10;"
Store service-account keys securely, use datasets per environment, set default project with bq --project_id
, and script loads in CI for repeatability.
Skipping authentication or forgetting --location
cause “Not found” errors. Always match dataset location to load jobs.
Yes. The bq
and gcloud
binaries are part of the SDK. Install them on each server or use a container image that already includes the tools.
Absolutely. Script your bq load
commands, store credentials in a secure location, and trigger the script with cron or any scheduler.
Yes. Use the --parameter
flag with bq query
or write parameterized SQL in tools like Galaxy to avoid SQL injection and improve reuse.