Advanced Plotting with Python

In previous lectures, we have explored the fundamentals of data visualization using matplotlib and pandas. We learned how to create basic plots such as line plots, scatter plots, and histograms.

As data analysis becomes more sophisticated, there are often cases where we want to explore data in more complex ways:

In this lecture, we will explore these three powerful plotting approaches and learn how to combine them with pandas DataFrames for effective data exploration and communication.

3D Plotting with Matplotlib

Matplotlib’s mplot3d toolkit enables the creation of 3D plots. This is particularly useful when you want to visualize three continuous variables simultaneously or show relationships between multiple dimensions.

To use 3D plotting capabilities, we need to import the necessary libraries.

Creating a 3D Scatter Plot

Let’s start by creating some sample 3D data and visualizing it with a scatter plot.

Now let’s create a 3D scatter plot. When we call plt.figure(), we specify projection='3d' to create a 3D axis.

Creating a 3D Surface Plot

Surface plots are useful for visualizing functions of two variables, such as z = f(x, y).

Controlling the Viewing Angle

One of the advantages of 3D plots in matplotlib is the ability to rotate the view. You can use the view_init() method to set the elevation and azimuth angles. These are defined as follows

Elevation Angle

Controls the vertical tilt of the viewing perspective. An elevation of 0° views the plot from the side, 90° views it from directly above, and negative values tilt the view downward.

Azimuth Angle

Controls the horizontal rotation around the plot. It determines which direction you’re “looking” at the 3D object. Common values are 0° (front view), 90° (right side), 180° (back view), and 270° (left side).

Together, these angles allow you to rotate a 3D plot to any desired perspective. For example:

  • view_init(elev=30, azim=45) gives a standard 3D perspective view
  • view_init(elev=90, azim=0) views the plot from directly above
  • view_init(elev=0, azim=0) views it from the front at ground level

Statistical Plotting with Seaborn

seaborn is built on top of matplotlib and provides a higher-level interface for creating statistical graphics. It integrates well with pandas DataFrames and automatically handles common tasks like aggregating data, computing statistics, and creating legends.

Some key advantages of seaborn are:

  • Well integrated with pandas
  • Easy style control with themes and color palettes
  • Some statistical estimations out of the box (error bars, confidence intervals, and regression lines)
  • Easier grids (Functions like FacetGrid for creating faceted plots)

You can find th complete documentation for seaborn (with lots of examples) at https://seaborn.pydata.org

Scatter Plots with Regression Lines

One of seaborn’s most powerful functions is regplot(), which creates a scatter plot with a fitted line and confidence interval.

This is useful whn we want to find trends in the data. At the moment, this is a pure black box. We will see more about fitting in the next workshops.

Categorical Plots

Often data is labelled by groups: types of experiments; locations; species; gender; status (“active”, “pending”) and infinitely many others.

seaborn provides several functions for visualizing categorical data. An extremely common way to visualise categorical data are box plots:

Box plots provide a compact summary of a distribution for a numerical variable (e.g. the marks in an assessment), typically grouped by a categorical variable (the school of affiliation of the marked students).

The box plot is centered around the idea of quartiles:

  • Q1 (first quartile): 25th percentile
  • Q2 (second quartile): 50th percentile (median)
  • Q3 (third quartile): 75th percentile
  • Q4 (fourth quartile): 100th percentile (maximum)

These components are visualized as follows:

  • Box: interquartile range (IQR = Q3 − Q1), showing the middle 50% of data
  • Median: line inside the box (Q2)
  • Whiskers: extend to the most extreme data points within 1.5 × IQR from the quartiles (common convention)
  • Outliers: individual points beyond the whiskers, plotted separately
  • Notches (optional): approximate confidence interval around the median

The boxplot() function creates box plots grouped by categories, while stripplot() shows individual data points over the boxplot itself.

Heatmaps

Heatmaps are useful for visualizing 2D arrays of data, particularly correlation matrices or contingency tables.

Seaborn’s heatmap() function can be used to create a heatmap from a pandas DataFrame. You can customize the color scheme, add annotations, and control the display of axes. It is similar to imshow() in matplotlib, but with additional features for handling data frames and annotations.

Multi-plot Grids

The FacetGrid function allows you to create grids of plots, one for each value of a categorical variable. This is powerful for comparing patterns across different subsets of data.

To use a FacetGrid, you first create the grid object and then map a plotting function to it. For example, you can create a grid of scatter plots for different species in the Iris dataset.

In particuar, one uses the method map() to apply a plotting function to each subset of the data defined by the grid. The map() method takes a plotting function (like sns.scatterplot) and the variables to plot as arguments. It applies the function to each subset of the data corresponding to the facets defined in the grid.

For example, if you have a FacetGrid that is faceted by a categorical variable (e.g., species), and you want to create a scatter plot of two numerical variables (e.g., sepal length and sepal width) for each species, you would use:

g = sns.FacetGrid(iris, col='species')
g.map(sns.scatterplot, 'sepal_length', 'sepal_width')

Interactive Plotting with Plotly

plotly is a powerful library for creating interactive, web-based visualizations. Unlike matplotlib, which creates static images, plotly plots are interactive and allow users to:

  • Hover over data points to see values
  • Zoom and pan to focus on regions of interest
  • Toggle data series on and off using the legend
  • Export plots as PNG images

Plotly integrates seamlessly with pandas DataFrames through the plotly.express module, which provides a simple interface similar to seaborn.

Plotly also supports 3D plotting and statistical plots, making it a versatile tool for data visualization.

All these features make plotly a great choice for interactive dashboards: these are simple web applications that allow users to explore data through interactive visualizations.

Basic Interactive Scatter Plot

Creating an interactive scatter plot with plotly is straightforward using plotly.express.scatter().

Interactive 3D Scatter Plot

Plotly can create interactive 3D scatter plots where you can rotate the plot using your mouse.

Interactive Box and Violin Plots

Plotly also provides functions for creating interactive statistical plots.

Interactive Line Plots with Multiple Series

Line plots with plotly allow you to interactively toggle series on and off.

Advanced: Subplots with Plotly

For more complex layouts, use plotly.subplots.make_subplots() to create grids of plots.

When to use each library:

Library Use Case Strengths Limitations
Matplotlib (3D) Academic publications, static images Full control, publication-ready Static, requires more code
Seaborn Statistical analysis, exploratory data analysis Beautiful defaults, statistical features Less customizable, static plots
Plotly Interactive dashboards, web applications, presentations Interactive, web-ready, modern look Heavier file sizes, learning curve

All three libraries work seamlessly with pandas DataFrames:

  • Both matplotlib and seaborn accept DataFrames through the data= parameter
  • plotly.express functions work directly with DataFrames
  • For matplotlib’s 3D plots, extract arrays using .to_numpy() or access columns directly