This resource breaks down the day-to-day work of a data analyst—from gathering and cleaning data to presenting insights that drive decisions. You’ll learn the key skills, common tools (SQL, Python, BI), and how Galaxy can streamline SQL-heavy workflows.
Data analysts transform raw numbers into actionable insights. In a world where every click, swipe, and purchase is logged, companies rely on analysts to make sense of the noise, spot trends, and inform strategy. Whether you’re measuring user growth at a SaaS startup or optimizing supply chains in retail, the analyst’s workflow follows a consistent pattern:
Every project starts with clarifying what needs to be answered and why it matters. Analysts collaborate with stakeholders—product managers, marketers, executives—to translate vague asks ("How healthy is our funnel?") into measurable KPIs ("What is the signup-to-activation conversion rate by cohort?").
Sources include:
In practice, 70–80% of ad-hoc questions are answered with SQL queries. This is where a fast editor such as Galaxy shines: autocomplete, schema browsing, and AI copilot reduce query time dramatically.
Real-world data is messy: duplicates, missing values, inconsistent formats. Analysts:
WHERE price > 0
)COALESCE()
)DATEDIFF
for user age)Cleaning can occur in SQL (CTEs, temp tables), Python (Pandas), or spreadsheets—depending on scale and complexity.
Before jumping to conclusions, analysts explore distributions, correlations, and outliers. Techniques include:
AVG
, MEDIAN
, STDDEV
)Example SQL in Galaxy:
WITH sessions AS (
SELECT user_id,
DATE_TRUNC('day', started_at) AS day,
COUNT(*) AS sessions
FROM app.page_views
WHERE started_at > CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1,2)
SELECT day,
AVG(sessions) AS avg_sessions_per_user
FROM sessions
GROUP BY 1
ORDER BY 1;
Galaxy’s inline chart preview lets you quickly visualize avg_sessions_per_user
as a line chart—catching anomalies instantly.
Depending on skill level, analysts may run:
While heavy modeling often moves to Python or R, many tests (t-tests, chi-square) can be executed directly in SQL or with statistical extensions.
The best analysis fails if stakeholders can’t understand it. Analysts build:
Key is tailoring the story: executives need high-level trends; engineers may want raw tables.
Deliver insights, field follow-up questions, refine assumptions. A feedback loop ensures recommendations are actionable and, ideally, drive measurable impact.
TimeTaskGalaxy Touchpoints9:00Stand-up with product team–9:30Draft metrics for new featureUse Galaxy AI copilot to prototype joins11:00Clean user event dataWrite CTEs; version queries in Git13:00Lunch & learningReview endorsed queries library14:00Visualize retentionInline chart preview15:30Stakeholder review callShare Galaxy link; discuss insights17:00Refactor long running queryUse AI to optimize execution plan
Why it happens: Missing indexes, large scans, suboptimal joins.
Fix:
EXPLAIN
plans in Galaxy (right-click → Explain).Different teams calculate "active users" differently. Solution: centralize definitions. Save the canonical query to a Galaxy Collection and mark as Endorsed.
Use staging tables or views to standardize formats. Document assumptions inside the query with comments.
Scenario: The marketing team wants to know weekly conversion from signup → paid.
saas_db
.WITH signups AS (
SELECT user_id, DATE_TRUNC('week', created_at) AS signup_week
FROM users
),
subscriptions AS (
SELECT user_id, DATE_TRUNC('week', paid_at) AS paid_week
FROM payments
WHERE status = 'paid'
)
SELECT s.signup_week,
COUNT(DISTINCT s.user_id) AS signups,
COUNT(DISTINCT p.user_id) AS paid_users,
ROUND(COUNT(DISTINCT p.user_id)::NUMERIC / COUNT(DISTINCT s.user_id), 2) AS conversion_rate
FROM signups s
LEFT JOIN subscriptions p
ON s.user_id = p.user_id
AND s.signup_week = p.paid_week
GROUP BY 1
ORDER BY 1;
Stretch goal: Use Galaxy’s AI copilot to refactor the query for performance or add cohort breakdown by acquisition channel.
Across each domain, analysts bridge the gap between raw data and strategic action.