Revision exercises

These exercises are an occasion to consolidate some techniues and see some of the potential ahead (e.g. using scipy).)

Each exercise has a short description, a PROVIDED code cell with data or setup you should not modify, and a COMPLETE THIS CODE cell where you write your solution.

Exercise 1: Customising Plot Styles

Objective: Practice using matplotlib’s colour, linestyle, and marker options to produce a well-formatted figure.

Matplotlib supports a wide range of style options:

  • Colours: named strings ("red", "midnightblue"), greyscale fractions ("0.5"), hex codes, or default cycle colours ("C0", "C1", "C2"). See the colour reference.
  • Linestyles: "-" (solid), "--" (dashed), ":" (dotted), "-." (dash-dot). See the linestyle gallery.
  • Markers: "o" (circle), "x" (cross), "." (point), "^" (triangle), etc. See the marker reference.

Task: The provided cell below defines three wave signals. Create a single figure with all three curves on the same axes:

  1. y1 — red, dashed line with circle markers ("o")
  2. y2 — midnightblue, solid line, no markers
  3. y3 — grey ("0.5"), dotted line with cross markers ("x")

Add a title, x-label, y-label, and a legend.

Exercise 2: Saving Figures

Objective: Practice saving matplotlib and pandas figures to file using fig.savefig().

The file format is determined by the extension you provide:

  • .png — PNG raster image (default)
  • .pdf — PDF vector format (ideal for publications)
  • Other formats: .eps, .ps, .svg

Task:

  1. The provided cell creates a matplotlib figure. Save it as both "sqrt_plot.png" and "sqrt_plot.pdf".
  2. Then practice saving a pandas plot using two different approaches:
    • Approach 1: Create a fig, ax pair with plt.subplots(), pass ax to df.plot(ax=ax), then save via fig.savefig().
    • Approach 2: Don’t create a fig explicitly — capture the Axes returned by ax = df.plot(), retrieve the Figure with ax.get_figure(), then save.

Approach 1 (Pandas): Pass an Axes object to df.plot() so you already hold the fig handle:

Approach 2 (Pandas): Let df.plot() return the Axes, then retrieve the Figure from it:

Exercise 3: Index-based Scatter Plot with Pandas

Objective: Learn how to create scatter-like plots in pandas when the x-axis should be the DataFrame index.

df.plot.scatter() requires two column names and cannot use the index as the x-axis. To produce a scatter-like appearance (markers, no connecting line) against the index, use df.plot.line() with two key parameters:

  • marker= — any marker symbol, e.g. "o", "x", "."
  • linestyle="None" (or " " or "") — suppresses the connecting line

Task:

  1. Using the weather DataFrame provided below, plot the "temperature" column as a scatter-like plot: circle markers ("o"), no connecting line.
  2. In the second code cell, create the same plot without linestyle="None" and observe what changes.

Exercise 4: Dual Y-axis Plots with twinx()

Objective: Create a figure that displays two datasets with different scales using two y-axes.

When two quantities have very different units or magnitudes, you can use ax.twinx() to create a second Axes that shares the x-axis but has an independent y-axis on the right side. Match the axis label and tick colours to the corresponding dataset so the figure is easy to read.

The default matplotlib cycle colours "C0", "C1", etc. are useful for visually separating the two datasets.

Task: Using the monthly climate data provided below, create a dual y-axis figure:

  • Left y-axis: monthly average temperature (°C) plotted as a line, coloured "C0"
  • Right y-axis: monthly rainfall (mm) plotted as a bar chart, coloured "C1"
  • Colour each y-axis label and its tick labels to match the data series
  • Set the x-tick labels to month names and add a title

Exercise 5: Revision of Random Number Generation with NumPy

Objective: Practice creating and using a NumPy random number generator.

The modern NumPy approach is to create a generator object first, optionally with a fixed seed so results are reproducible:

from numpy import random
rng = random.default_rng(seed=42)

You can then call methods on rng to draw samples:

  • rng.integers(low, high) — a single random integer in [low, high)
  • rng.integers(low, high, size=n) — an array of n random integers
  • rng.choice(seq) — a single random element from a sequence
  • rng.choice(seq, size=k, replace=False)k unique elements (no repeats)
  • rng.shuffle(seq) — shuffle a list in-place

Tasks:

  1. Create a generator with seed=99.
  2. Generate an array of 10 random integers between 1 and 100 (inclusive of 1, exclusive of 100).
  3. From the list ["rock", "paper", "scissors"], simulate 5 random choices with replacement.
  4. Shuffle the list [1, 2, 3, 4, 5] in-place and print it before and after.

Exercise 6: DatetimeIndex and Filtering by Month

Objective: Work with a pandas DatetimeIndex to filter rows by a specific month.

When a DataFrame has a DatetimeIndex, you can access time components via .index.month, .index.year, .index.day, etc. These return integer arrays you can use as boolean filters.

Tasks — using the DataFrame provided below:

  1. Confirm the index is a DatetimeIndex by printing df.index.
  2. Extract only the rows from March (month 3) using .index.month.
  3. Calculate the mean value for each month across all years. (Hint: group by df.index.month using df.groupby(df.index.month).mean().)

Exercise 7: NaN Propagation

Objective: Understand how NaN values behave in arithmetic operations on pandas Series.

Key rules:

  • NaN + numberNaN (NaN is “viral” in arithmetic)
  • Pandas aggregation functions (.mean(), .sum(), etc.) skip NaN by default
  • You can fill NaN values before operating with df["col"].fillna(value)

Tasks — using the DataFrame provided below:

  1. Add the two columns "a" and "b" together. Observe what happens where one value is NaN.
  2. Add 100 to column "b". What happens to the NaN entries?
  3. Compute the mean of "b" — does pandas skip the NaN values?
  4. Fill the NaN values in "b" with 0 first, then add 100. How does the result differ?

Exercise 8: Finding a Minimum with scipy.optimize

Objective: Use scipy.optimize.minimize_scalar to find the minimum of a function, and restrict the search to a specific interval using bounds.

The lecture showed minimising \(f(x) = x^2\) — a simple case where the minimum is at 0. In practice you often need to find a minimum inside a specific range. Passing method="bounded" and bounds=(a, b) restricts the search to \([a, b]\):

from scipy.optimize import minimize_scalar
result = minimize_scalar(f, bounds=(0, 10), method="bounded")
print(result.x, result.fun)

Tasks:

  1. Define the function \(f(x) = \sin(x) + 0.1 \, x^2\).
  2. Use minimize_scalar with bounds=(0, 10) and method="bounded" to find its minimum in \([0, 10]\).
  3. Plot \(f(x)\) over \([0, 10]\) and mark the minimum with a red dot. Label the axes and add a legend.

Exercise 9: Sampling from Distributions with scipy.stats

Objective: Use scipy.stats distribution objects to draw random samples and plot theoretical probability density functions (PDFs).

scipy.stats provides objects for many statistical distributions. Each object lets you:

  • .rvs(size=n, random_state=rng) — draw n random samples
  • .pdf(x) — evaluate the theoretical probability density at each point in x
  • .mean(), .std() — get the theoretical mean and standard deviation

For example, a normal distribution with mean 5 and standard deviation 2:

from scipy import stats
dist = stats.norm(loc=5, scale=2)

samples = dist.rvs(size=200, random_state=42)

Tasks:

  1. Create a normal distribution with mean 10 and standard deviation 3. Draw 500 samples from it.

  2. Plot a normalised histogram (density=True) of the samples.

  3. On the same axes, overlay the theoretical PDF curve using .pdf() evaluated over a suitable range of x values.

  4. Print the theoretical mean and standard deviation using .mean() and .std(), and compare them to samples.mean() and samples.std().