Visualizing the frequency distribution of categorical or discrete variables by leveraging Seaborn’s high-level categorical plots such as sns.countplot() and sns.histplot().
Plotting value counts—also known as frequency distributions—is one of the most common first steps in exploratory data analysis (EDA). When you can see how often each category or integer value occurs, you quickly spot dominant classes, rare events, class imbalance, and potential data-quality issues. Seaborn, the statistical plotting library built on top of Matplotlib, provides a concise and aesthetically pleasing API for these visualizations, saving you from writing boilerplate code.
Whether you’re building a classification model, running A/B tests, or summarizing user behavior, knowing the distribution of a categorical feature is critical. For instance, if your churn dataset has 95 % “active” and 5 % “churned” customers, you might opt for stratified sampling or specialized metrics. Likewise, product teams often want to know how a new feature is being adopted in each user segment. Value-count plots give an immediate, intuitive answer.
countplot()
is Seaborn’s dedicated helper for one-dimensional frequency plots. Under the hood it calls numpy.bincount
or a POSIX-compliant collections.Counter
on categorical inputs, then renders a bar plot with automatically calculated heights.
sns.countplot(data=df, x="col")
sns.countplot(data=df, x="col", hue="segment")
y
instead of x
or pass orient="h"
Although often used for continuous data, histplot()
with discrete=True
works for integer-encoded categories. It aggregates observations into histogram bins of width 1 and aligns bars to integer ticks.
catplot()
is a figure-level interface that can wrap countplot
behind the scenes while letting you facet by additional variables using row
, col
, or hue
. This is perfect for small-multiples dashboards.
df["user_type"].value_counts(dropna=False)
Before plotting, confirm that categories look sensible, are consistently cased, and include NaN
values only where expected.
If the feature is a string or pandas category
, reach for countplot
. If it’s integer codes from 0–N with no gaps, both countplot
and histplot(discrete=True)
are valid.
order = df["user_type"].value_counts().index
sns.countplot(data=df, x="user_type", order=order, palette="viridis")
Sorting bars by frequency (order=list(df["col"].value_counts().keys())
) avoids chaotic, alphabetic arrangements.
ax = sns.countplot(x="user_type", data=df)
ax.bar_label(ax.containers[0], fmt=lambda c: f"{100*c/len(df):.1f}%")
Percent annotations make comparisons visually immediate.
sns.countplot(data=df, x="plan", hue="country", dodge=True)
Adding hue
turns the graph into a two-way contingency visualization akin to a filled mosaic.
sns.catplot(data=df, x="plan", col="device", col_wrap=3, kind="count")
This call produces a grid of small bar charts, each showing plan distribution by device type. The result is easier to scan than a single over-crowded figure.
Sometimes you want a relative frequency chart. Compute proportions beforehand and feed them into barplot()
.
props = (df["segment"].value_counts(normalize=True)
.rename_axis("segment")
.reset_index(name="pct"))
sns.barplot(data=props, x="segment", y="pct");
If a record can belong to multiple categories—for instance, a user’s set of interests—pivot the DataFrame into molten (long) format and plot counts on the exploded rows.
color_palette("colorblind")
is your friend.Beginners often run df["col"].value_counts()
and pass the resulting Series to sns.barplot()
. While functional, you lose the original label inferred by countplot()
, and extra code is required to convert the Series to two columns. Instead, call countplot
directly.
Seaborn defaults to the category’s order in the DataFrame, which is frequently alphabetical. Sorting by frequency or meaning (e.g., “Low < Medium < High”) communicates the story better.
histplot()
only bins numeric data. If you feed it strings, Seaborn will raise a TypeError. Use countplot()
for text categories, or convert the column to categorical codes first.
import seaborn as sns, pandas as pd
# Load sample dataset
titanic = sns.load_dataset("titanic")
# Plot passenger class distribution
sns.countplot(data=titanic, x="class", hue="survived",
palette="Set2")
plt.title("Passenger Class vs. Survival on the Titanic")
plt.xlabel("Ticket Class")
plt.ylabel("Number of Passengers")
plt.legend(title="Survived", labels=["No", "Yes"])
plt.show()
The result instantly reveals that most passengers were in 3rd class and that survival probability was lower there, guiding further statistical modelling.
Seaborn’s countplot
, histplot
, and catplot
provide a powerful, boilerplate-free way to visualize categorical frequencies. By attending to ordering, labeling, and context, you can turn simple bar charts into compelling narratives that steer both business and technical decisions.
Visualizing value counts is the fastest way to detect class imbalance, rare categories, and data-quality issues. In data engineering, these insights influence sampling strategies, model choice, and storage optimizations. Seaborn automates frequency aggregation and styling, allowing engineers to focus on analysis rather than plotting boilerplate, which accelerates exploratory workflows and promotes data-driven decisions.
countplot() is purpose-built for categorical variables (strings or categories) and automatically counts occurrences. histplot() targets continuous or discrete numeric data and bins values; you must set discrete=True for integer counts.
Compute proportions with value_counts(normalize=True), then pass the resulting DataFrame to sns.barplot() or annotate the countplot bars with ax.bar_label().
Seaborn follows the order of categories in the DataFrame, which defaults to lexicographic. Provide an order parameter or convert the column to pandas.Categorical with an ordered category list.
Aggregate infrequent levels into an "Other" bucket before plotting, or filter the DataFrame to show only the top N categories for readability.