Beginners Resources

Data Visualization Basics: Turning Data Into Insight

Welcome to Galaxy!
You'll be receiving a confirmation email.

In the meantime, follow us on Twitter
Oops! Something went wrong while submitting the form.

This lesson introduces the core principles and practices of data visualization. You’ll learn why visualizing data is essential, how to select the right chart, and how to build clear, compelling visuals using Python and SQL. We also show how Galaxy’s SQL editor can preview charts straight from query results, giving you an end-to-end workflow.

Table of Contents

Learning Objectives

  • Define data visualization and explain its importance.
  • Recognize common chart types and the questions they answer.
  • Apply best-practice design principles to avoid misleading visuals.
  • Create basic visualizations in Python (Matplotlib) and directly from SQL with Galaxy.
  • Diagnose and correct common visualization mistakes.

1. What Is Data Visualization?

Data visualization is the practice of translating raw data into graphical representations—a bar chart, line graph, map, or bespoke interactive—that reveal patterns, trends, and outliers faster than rows of numbers ever could. Humans process visuals 60,000× faster than text, making charts indispensable for analysis and communication.

1.1 Why It Matters

  • Cognitive efficiency: We spot trends at a glance.
  • Storytelling: Charts anchor narratives, persuading stakeholders.
  • Error detection: Visual anomalies often reveal data quality issues.
  • Democratization: Non-technical teammates can digest insights without reading SQL.

2. Foundational Concepts

2.1 Data Types and Encodings

Match data type to visual channel:

  • Quantitative → position on a common scale, length, angle.
  • Categorical → color hue, facets.
  • Temporal → horizontal position, animation.

2.2 Chart Selection Cheat-Sheet

QuestionBest ChartCompare categoriesBar, Column, Dot plotShow change over timeLine, AreaPart-to-wholeStacked bar, Treemap (avoid 3-D pies!)DistributionHistogram, Box plot, ViolinRelationshipScatter, Bubble, Heat map

3. Hands-On Example: Python + Matplotlib

Suppose you have monthly revenue data and want to visualize growth.

import pandas as pd
import matplotlib.pyplot as plt

data = {
"month": ["Jan", "Feb", "Mar", "Apr", "May", "Jun"],
"revenue": [22_000, 24_500, 27_000, 31_200, 35_100, 40_300]
}
df = pd.DataFrame(data)

plt.figure(figsize=(8,4))
plt.plot(df["month"], df["revenue"], marker="o", color="#4F6D7A")
plt.title("Monthly Revenue, H1 2024")
plt.ylabel("USD (thousands)")
plt.tight_layout()
plt.show()

What to observe: The slope accelerates after March—a cue to dig into driver events such as a product launch.

4. Query-to-Chart in Galaxy

4.1 Setup

  1. Connect Galaxy to your Postgres or Snowflake database.
  2. Create a sales Collection to store revenue queries.

4.2 Write the Query

SELECT
DATE_TRUNC('month', order_date) AS month,
SUM(total_amount) / 1000 AS revenue_k
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY 1
ORDER BY 1;

4.3 Visualize in One Click

  1. Run the query in Galaxy’s editor (⌘+Enter).
  2. Click the Chart tab above results.
  3. Galaxy auto-detects month as the X-axis and revenue_k as the Y-axis. Choose a Line chart.
  4. Save the chart + query to the sales Collection, click Endorse, and share the link with Finance.

This workflow keeps SQL, visualization, and collaboration in a single source-of-truth—no ad-hoc CSV exports.

5. Best Practices & Design Principles

5.1 Keep It Simple

Avoid chart junk—gradients, 3-D effects, excessive gridlines.

5.2 Use Color Intentionally

  • Limit palette to 6-8 hues.
  • Reserve bright accent for key data series.
  • Check color-blind friendliness with tools like ColorBrewer.

5.3 Label Clearly

Axes, units, and annotations should be self-explanatory. If the audience has to guess, you’ve lost them.

5.4 Maintain Proportional Scales

Always start bar charts at zero; use consistent intervals on time axes.

6. Common Mistakes & Troubleshooting

  • Pie chart overload: Too many slices? Switch to a bar chart.
  • Dual y-axes: They confuse scale; normalize data or separate charts.
  • Misaligned time series: Time zones and missing dates skew lines. Fill gaps first.
  • Overplotting in scatterplots: Use transparency (alpha) or density contours.
  • Neglecting data quality: Visualizing bad data amplifies errors. Validate, then visualize.

7. Practice Exercises

  1. Bar Chart: Using the orders table, visualize revenue by product category.
  2. Distribution: Plot a histogram of order sizes; identify skewness.
  3. SQL + Galaxy: Build a stacked area chart of monthly active users (MAU) by platform (web, iOS, Android). Endorse and share.
  4. Python Challenge: Load the public tips dataset from Seaborn and create a scatterplot of total_bill vs tip colored by day. Add a trendline.

8. Key Takeaways

  • Data visualization is about clarity and insight, not decoration.
  • Choose chart types based on the analytical question.
  • Follow design principles—minimalism, intentional color, clear labeling.
  • Tools abound (Python, R, BI platforms), but Galaxy lets you preview charts straight from SQL, unifying workflow and governance.
  • Practice: the fastest way to level-up is to visualize your own data, iterate, and solicit feedback.

Next Steps

Advance to intermediate topics like interactive dashboards, geospatial mapping, or storytelling with multiple linked charts. Galaxy’s roadmap includes lightweight visual builder features—get started now so you’re ready when they land!

Additional Resources

  • Storytelling with Data by Cole Nussbaumer Knaflic
  • Fundamentals of Data Visualization by Claus O. Wilke (free online)
  • Python libraries: Matplotlib, Seaborn, Plotly
  • Observable Plot (JavaScript) for interactive web visuals

Check out some other beginners resources