Beginners Resources

What Does a Data Analyst Do? Essential Skills, Tasks, and Tools

Welcome to the Galaxy, Guardian!
You'll be receiving a confirmation email

Follow us on twitter :)
Oops! Something went wrong while submitting the form.

This resource breaks down the day-to-day work of a data analyst—from gathering and cleaning data to presenting insights that drive decisions. You’ll learn the key skills, common tools (SQL, Python, BI), and how Galaxy can streamline SQL-heavy workflows.

Table of Contents

Learning Objectives

  • Understand the end-to-end workflow of a data analyst.
  • Identify the core skills—technical and soft—required for the role.
  • Learn how to collect, clean, analyze, and visualize data with real examples.
  • Practice writing SQL in Galaxy to answer business questions.
  • Recognize common pitfalls and learn troubleshooting strategies.

1. Introduction: Why Data Analysis Matters

Data analysts transform raw numbers into actionable insights. In a world where every click, swipe, and purchase is logged, companies rely on analysts to make sense of the noise, spot trends, and inform strategy. Whether you’re measuring user growth at a SaaS startup or optimizing supply chains in retail, the analyst’s workflow follows a consistent pattern:

  1. Define the question
  2. Collect the data
  3. Clean and prepare
  4. Analyze
  5. Visualize & communicate
  6. Iterate

2. Core Responsibilities

2.1 Defining the Business Question

Every project starts with clarifying what needs to be answered and why it matters. Analysts collaborate with stakeholders—product managers, marketers, executives—to translate vague asks ("How healthy is our funnel?") into measurable KPIs ("What is the signup-to-activation conversion rate by cohort?").

2.2 Data Collection

Sources include:

  • Databases (PostgreSQL, Snowflake) via SQL
  • APIs (Google Analytics, Stripe)
  • CSV/Excel uploads
  • Third-party tools (marketing platforms, CRMs)

In practice, 70–80% of ad-hoc questions are answered with SQL queries. This is where a fast editor such as Galaxy shines: autocomplete, schema browsing, and AI copilot reduce query time dramatically.

2.3 Data Cleaning & Preparation

Real-world data is messy: duplicates, missing values, inconsistent formats. Analysts:

  • Filter invalid rows (WHERE price > 0)
  • Handle nulls (COALESCE())
  • Standardize units (kg vs lbs)
  • Create derived fields (e.g., DATEDIFF for user age)

Cleaning can occur in SQL (CTEs, temp tables), Python (Pandas), or spreadsheets—depending on scale and complexity.

2.4 Exploratory Data Analysis (EDA)

Before jumping to conclusions, analysts explore distributions, correlations, and outliers. Techniques include:

  • Descriptive statistics (AVG, MEDIAN, STDDEV)
  • Group-bys and pivots
  • Visual checks—histograms, scatterplots

Example SQL in Galaxy:

WITH sessions AS (
SELECT user_id,
DATE_TRUNC('day', started_at) AS day,
COUNT(*) AS sessions
FROM app.page_views
WHERE started_at > CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1,2)
SELECT day,
AVG(sessions) AS avg_sessions_per_user
FROM sessions
GROUP BY 1
ORDER BY 1;

Galaxy’s inline chart preview lets you quickly visualize avg_sessions_per_user as a line chart—catching anomalies instantly.

2.5 Statistical Analysis & Modeling

Depending on skill level, analysts may run:

  • Hypothesis tests (A/B testing)
  • Regression for trend prediction
  • Clustering to segment users

While heavy modeling often moves to Python or R, many tests (t-tests, chi-square) can be executed directly in SQL or with statistical extensions.

2.6 Visualization & Storytelling

The best analysis fails if stakeholders can’t understand it. Analysts build:

  • Dashboards (Looker, Tableau, Metabase)
  • Reports (Slides, Notion)
  • Ad-hoc charts (Galaxy’s upcoming lightweight viz)

Key is tailoring the story: executives need high-level trends; engineers may want raw tables.

2.7 Communication & Iteration

Deliver insights, field follow-up questions, refine assumptions. A feedback loop ensures recommendations are actionable and, ideally, drive measurable impact.

3. Essential Skills

3.1 Technical Skills

  • SQL – bread and butter of querying; Galaxy accelerates this.
  • Spreadsheet literacy – quick sanity checks, pivot tables.
  • Python/R – for complex data wrangling and stats.
  • BI tools – dashboard creation (Tableau, Looker).
  • Version control – Git for sharing SQL scripts, enabled in Galaxy via GitHub sync.

3.2 Soft Skills

  • Critical thinking – question assumptions.
  • Communication – translate numbers to stories.
  • Domain knowledge – understand product and business context.
  • Collaboration – work with cross-functional teams.

4. Day-in-the-Life Timeline

TimeTaskGalaxy Touchpoints9:00Stand-up with product team–9:30Draft metrics for new featureUse Galaxy AI copilot to prototype joins11:00Clean user event dataWrite CTEs; version queries in Git13:00Lunch & learningReview endorsed queries library14:00Visualize retentionInline chart preview15:30Stakeholder review callShare Galaxy link; discuss insights17:00Refactor long running queryUse AI to optimize execution plan

5. Common Obstacles & Troubleshooting

5.1 Slow Queries

Why it happens: Missing indexes, large scans, suboptimal joins.

Fix:

  • Check EXPLAIN plans in Galaxy (right-click → Explain).
  • Aggregate early (reduce rows), filter with indexed columns.
  • Seek help from Galaxy AI: "Suggest optimizations for this query."

5.2 Conflicting Metrics

Different teams calculate "active users" differently. Solution: centralize definitions. Save the canonical query to a Galaxy Collection and mark as Endorsed.

5.3 Messy Source Data

Use staging tables or views to standardize formats. Document assumptions inside the query with comments.

6. Hands-On Exercise

Scenario: The marketing team wants to know weekly conversion from signup → paid.

  1. Open Galaxy and connect to the sample saas_db.
  2. Run the starter template below:

WITH signups AS (
SELECT user_id, DATE_TRUNC('week', created_at) AS signup_week
FROM users
),
subscriptions AS (
SELECT user_id, DATE_TRUNC('week', paid_at) AS paid_week
FROM payments
WHERE status = 'paid'
)
SELECT s.signup_week,
COUNT(DISTINCT s.user_id) AS signups,
COUNT(DISTINCT p.user_id) AS paid_users,
ROUND(COUNT(DISTINCT p.user_id)::NUMERIC / COUNT(DISTINCT s.user_id), 2) AS conversion_rate
FROM signups s
LEFT JOIN subscriptions p
ON s.user_id = p.user_id
AND s.signup_week = p.paid_week
GROUP BY 1
ORDER BY 1;

  1. Visualize the conversion rate as a bar chart.
  2. Endorse the query in the Marketing Metrics collection.
  3. Share the link with stakeholders.

Stretch goal: Use Galaxy’s AI copilot to refactor the query for performance or add cohort breakdown by acquisition channel.

7. Real-World Applications

  • Product Analytics: Measure feature adoption, funnel drop-offs.
  • Finance: Forecast revenue, analyze churn.
  • Operations: Optimize inventory based on demand patterns.
  • Marketing: Attribute campaign spend to conversions.

Across each domain, analysts bridge the gap between raw data and strategic action.

8. Best Practices

  • Start with a hypothesis; don’t boil the ocean.
  • Version control every query (Galaxy ↔ GitHub sync).
  • Document assumptions & edge cases.
  • Validate data freshness and lineage.
  • Automate recurring reports with scheduled jobs.

Key Takeaways

  • Data analysts follow a repeatable pipeline: define, collect, clean, analyze, visualize, communicate.
  • SQL is the foundational skill—Galaxy accelerates writing, optimizing, and sharing queries.
  • Soft skills are as critical as technical chops; storytelling drives impact.
  • Common pain points include slow queries, inconsistent metrics, and messy data—each solvable with best practices and the right tooling.

Next Steps

  1. Create a free Galaxy workspace and connect a sample database.
  2. Work through at least three business questions using SQL and the AI copilot.
  3. Publish your queries to Collections and invite a peer for feedback.
  4. Dive deeper into statistics: experiment with A/B test analysis.

Check out some other beginners resources