Plotting Value Counts with Seaborn

Galaxy Glossary

How do I plot value counts with Seaborn?

Visualizing the frequency distribution of categorical or discrete variables by leveraging Seaborn’s high-level categorical plots such as sns.countplot() and sns.histplot().

Sign up for the latest in SQL knowledge from the Galaxy Team!
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Description

Plotting value counts—also known as frequency distributions—is one of the most common first steps in exploratory data analysis (EDA). When you can see how often each category or integer value occurs, you quickly spot dominant classes, rare events, class imbalance, and potential data-quality issues. Seaborn, the statistical plotting library built on top of Matplotlib, provides a concise and aesthetically pleasing API for these visualizations, saving you from writing boilerplate code.

Why Value-Count Plots Matter

Whether you’re building a classification model, running A/B tests, or summarizing user behavior, knowing the distribution of a categorical feature is critical. For instance, if your churn dataset has 95 % “active” and 5 % “churned” customers, you might opt for stratified sampling or specialized metrics. Likewise, product teams often want to know how a new feature is being adopted in each user segment. Value-count plots give an immediate, intuitive answer.

Seaborn’s Main Tools for Frequency Plots

1. sns.countplot()

countplot() is Seaborn’s dedicated helper for one-dimensional frequency plots. Under the hood it calls numpy.bincount or a POSIX-compliant collections.Counter on categorical inputs, then renders a bar plot with automatically calculated heights.

  • Minimal syntax: sns.countplot(data=df, x="col")
  • Hue dimension: sns.countplot(data=df, x="col", hue="segment")
  • Orientation: Set y instead of x or pass orient="h"

2. sns.histplot()

Although often used for continuous data, histplot() with discrete=True works for integer-encoded categories. It aggregates observations into histogram bins of width 1 and aligns bars to integer ticks.

3. sns.catplot()

catplot() is a figure-level interface that can wrap countplot behind the scenes while letting you facet by additional variables using row, col, or hue. This is perfect for small-multiples dashboards.

Step-by-Step Workflow

1. Inspect Raw Categories

df["user_type"].value_counts(dropna=False)

Before plotting, confirm that categories look sensible, are consistently cased, and include NaN values only where expected.

2. Pick the Right Plot Function

If the feature is a string or pandas category, reach for countplot. If it’s integer codes from 0–N with no gaps, both countplot and histplot(discrete=True) are valid.

3. Decide on Order and Palette

order = df["user_type"].value_counts().index
sns.countplot(data=df, x="user_type", order=order, palette="viridis")

Sorting bars by frequency (order=list(df["col"].value_counts().keys())) avoids chaotic, alphabetic arrangements.

4. Add Percent Labels (Optional)

ax = sns.countplot(x="user_type", data=df)
ax.bar_label(ax.containers[0], fmt=lambda c: f"{100*c/len(df):.1f}%")

Percent annotations make comparisons visually immediate.

5. Combine with Another Dimension

sns.countplot(data=df, x="plan", hue="country", dodge=True)

Adding hue turns the graph into a two-way contingency visualization akin to a filled mosaic.

Advanced Techniques

A. Faceting with catplot()

sns.catplot(data=df, x="plan", col="device", col_wrap=3, kind="count")

This call produces a grid of small bar charts, each showing plan distribution by device type. The result is easier to scan than a single over-crowded figure.

B. Transforming Counts to Rates

Sometimes you want a relative frequency chart. Compute proportions beforehand and feed them into barplot().

props = (df["segment"].value_counts(normalize=True)
.rename_axis("segment")
.reset_index(name="pct"))

sns.barplot(data=props, x="segment", y="pct");

C. Multiple Response Variables

If a record can belong to multiple categories—for instance, a user’s set of interests—pivot the DataFrame into molten (long) format and plot counts on the exploded rows.

Best Practices

  • Use pandas Categorical to guarantee category order and include missing levels.
  • Add context with meaningful axis labels and a descriptive title.
  • Limit category cardinality; aggregate rare levels into “Other” when bars become unreadable.
  • Respect color-blind palettes; Seaborn’s color_palette("colorblind") is your friend.

Common Pitfalls and How to Avoid Them

Pitfall 1 – Calling value_counts() First and Losing the Index

Beginners often run df["col"].value_counts() and pass the resulting Series to sns.barplot(). While functional, you lose the original label inferred by countplot(), and extra code is required to convert the Series to two columns. Instead, call countplot directly.

Pitfall 2 – Forgetting to Sort or Specify Order

Seaborn defaults to the category’s order in the DataFrame, which is frequently alphabetical. Sorting by frequency or meaning (e.g., “Low < Medium < High”) communicates the story better.

Pitfall 3 – Using histplot() on Strings

histplot() only bins numeric data. If you feed it strings, Seaborn will raise a TypeError. Use countplot() for text categories, or convert the column to categorical codes first.

Real-World Example: Titanic Survival

import seaborn as sns, pandas as pd

# Load sample dataset
titanic = sns.load_dataset("titanic")

# Plot passenger class distribution
sns.countplot(data=titanic, x="class", hue="survived",
palette="Set2")
plt.title("Passenger Class vs. Survival on the Titanic")
plt.xlabel("Ticket Class")
plt.ylabel("Number of Passengers")
plt.legend(title="Survived", labels=["No", "Yes"])
plt.show()

The result instantly reveals that most passengers were in 3rd class and that survival probability was lower there, guiding further statistical modelling.

Wrapping Up

Seaborn’s countplot, histplot, and catplot provide a powerful, boilerplate-free way to visualize categorical frequencies. By attending to ordering, labeling, and context, you can turn simple bar charts into compelling narratives that steer both business and technical decisions.

Why Plotting Value Counts with Seaborn is important

Visualizing value counts is the fastest way to detect class imbalance, rare categories, and data-quality issues. In data engineering, these insights influence sampling strategies, model choice, and storage optimizations. Seaborn automates frequency aggregation and styling, allowing engineers to focus on analysis rather than plotting boilerplate, which accelerates exploratory workflows and promotes data-driven decisions.

Plotting Value Counts with Seaborn Example Usage


sns.countplot(data=titanic, x='class')

Common Mistakes

Frequently Asked Questions (FAQs)

What is the difference between sns.countplot() and sns.histplot()?

countplot() is purpose-built for categorical variables (strings or categories) and automatically counts occurrences. histplot() targets continuous or discrete numeric data and bins values; you must set discrete=True for integer counts.

How can I display percentages instead of raw counts?

Compute proportions with value_counts(normalize=True), then pass the resulting DataFrame to sns.barplot() or annotate the countplot bars with ax.bar_label().

Why are my bars ordered alphabetically and how do I fix it?

Seaborn follows the order of categories in the DataFrame, which defaults to lexicographic. Provide an order parameter or convert the column to pandas.Categorical with an ordered category list.

How do I handle a categorical column with dozens of rare categories?

Aggregate infrequent levels into an "Other" bucket before plotting, or filter the DataFrame to show only the top N categories for readability.

Want to learn about other SQL terms?