Running Apache Superset in Kubernetes means packaging Superset’s web server, Celery workers, metadata DB, and dependencies as containerized workloads managed by Kubernetes, enabling scalable, resilient, and repeatable BI deployments.
Running Apache Superset in Kubernetes
Learn why and how to deploy Apache Superset on Kubernetes for scalable, production-grade business-intelligence, including Helm-based installation, persistence, scaling, and common pitfalls.
Apache Superset is a modern, open-source business-intelligence (BI) platform that lets users explore data and build interactive dashboards through a browser-based UI. It ships with a SQL editor, rich visualization library, metadata caching, and a robust role-based access control (RBAC) model.
Kubernetes (K8s) has become the de-facto standard for orchestrating containerized applications. Deploying Superset on K8s offers several advantages:
A production Superset cluster typically contains five core components:
Kubernetes lets you containerize each piece, wire them together with Services, and expose the UI with an Ingress.
The apache/superset
Helm chart maintained by the Superset community encapsulates best-practice manifests and sane defaults. Using Helm allows parameter overrides via values.yaml
, version pinning, and one-command upgrades.
If you need fine-grained control, craft your own Deployment
, StatefulSet
, Secret
, and Ingress
objects. This path requires more maintenance but offers maximum flexibility.
Some teams wrap Superset inside a custom operator to automate database migrations and configuration seeding. Operators add CRDs but can simplify Day-2 operations.
helm repo add superset https://apache.github.io/superset
kubectl create ns superset
kubectl -n superset create secret generic superset-secrets \
--from-literal=secret_key="$(openssl rand -hex 32)" \
--from-literal=db_pass="POSTGRES_PASSWORD"
Storing secrets in Secret
s (or an external vault) avoids hard-coding sensitive values in values.yaml
.
values.yaml
replicaCount: 2
configOverrides:
SECRET_KEY: "{{ .Values.global.supersetSecretKey }}"
SQLALCHEMY_DATABASE_URI: postgresql+psycopg2://superset:{{ .Values.global.dbPass }}@postgres.example:5432/superset
REDIS_HOST: redis-master
service:
type: ClusterIP
ingress:
enabled: true
hosts:
- host: superset.example.com
paths: ["/"]
tls:
- secretName: superset-tls
hosts:
- superset.example.com
helm upgrade --install superset superset/superset -n superset -f values.yaml
kubectl -n superset exec -it deploy/superset -- bash -c "superset db upgrade && superset fab create-admin"
On new clusters run migrations and create the first admin user.
Superset is largely stateless once it connects to Postgres and Redis. However, you may enable a shared uploads
volume for CSV exports and custom images. Use a PersistentVolume
(EFS, Filestore, etc.) mounted to both web and worker pods.
HorizontalPodAutoscaler
.Follow these guidelines:
SECRET_KEY
, DB creds, and OAuth client secrets in Secret
s or a cloud key-manager.Keep your values.yaml
and overlays in Git. A GitOps engine like Argo CD or Flux will diff and sync changes automatically. To upgrade Superset, commit a version bump (image.tag: 3.0.1
) and let the controller perform a rolling deploy.
prometheus-flask-exporter
plugin.Why it’s wrong: Running Postgres inside a pod without a PVC risks data loss on reschedule.
Fix: Use a StatefulSet+PVC or an external managed database.
SECRET_KEY
Why it’s wrong: Superset will generate a random key on each pod start, invalidating sessions.
Fix: Supply a stable SECRET_KEY
via a Secret mounted as env var.
Why it’s wrong: Running too many concurrent tasks exhausts Redis and DB connections, causing dashboard timeouts.
Fix: Benchmark workload, tune worker_concurrency
, and use an HPA driven by queue_latency
.
Superset’s built-in SQL Lab editor is convenient for analysts, but many engineers prefer a dedicated IDE-style experience. Galaxy offers a lightning-fast SQL editor with AI copilot and versioned query sharing. Teams often:
This workflow separates ad-hoc exploration (Galaxy) from governed visualization (Superset).
Deploying Apache Superset on Kubernetes marries a powerful BI platform with a resilient orchestration layer. By containerizing web and worker services, externalizing state, and embracing Helm or GitOps, teams achieve repeatable, scalable analytics infrastructure. Avoid common pitfalls—persistent storage, secret management, and right-sizing workers—and you’ll unlock interactive dashboards that scale right alongside your data.
Modern data teams need BI tools that grow with traffic and data volume. Containerizing Superset lets engineers deploy, scale, and update dashboards the same way they manage microservices. Kubernetes unlocks high availability, declarative configuration, and cloud-agnostic portability, ensuring BI stays online even during node failures or peak loads.
Yes. The metadata DB should live outside the cluster or on a StatefulSet with persistent volumes. Managed Postgres or CloudSQL is common.
Bump the image.tag
field in your values.yaml
or pass --set image.tag=X.Y.Z
, then run helm upgrade
. Helm performs a rolling update; remember to execute superset db upgrade
afterward.
Absolutely. Galaxy offers a developer-centric SQL editor with AI assistance. You can develop, share, and endorse SQL in Galaxy, then paste the final query into Superset’s SQL Lab or save it as a dataset for dashboards.
Create a separate Deployment
for Celery workers and attach an HorizontalPodAutoscaler
based on CPU or custom queue-length metrics.