Revision exercises
These exercises are an occasion to consolidate some techniues and see some of the potential ahead (e.g. using scipy).)
Each exercise has a short description, a PROVIDED code cell with data or setup you should not modify, and a COMPLETE THIS CODE cell where you write your solution.
Exercise 1: Customising Plot Styles
Objective: Practice using matplotlib’s colour, linestyle, and marker options to produce a well-formatted figure.
Matplotlib supports a wide range of style options:
- Colours: named strings (
"red","midnightblue"), greyscale fractions ("0.5"), hex codes, or default cycle colours ("C0","C1","C2"). See the colour reference. - Linestyles:
"-"(solid),"--"(dashed),":"(dotted),"-."(dash-dot). See the linestyle gallery. - Markers:
"o"(circle),"x"(cross),"."(point),"^"(triangle), etc. See the marker reference.
Task: The provided cell below defines three wave signals. Create a single figure with all three curves on the same axes:
y1— red, dashed line with circle markers ("o")y2— midnightblue, solid line, no markersy3— grey ("0.5"), dotted line with cross markers ("x")
Add a title, x-label, y-label, and a legend.
Exercise 2: Saving Figures
Objective: Practice saving matplotlib and pandas figures to file using fig.savefig().
The file format is determined by the extension you provide:
.png— PNG raster image (default).pdf— PDF vector format (ideal for publications)- Other formats:
.eps,.ps,.svg
Task:
- The provided cell creates a matplotlib figure. Save it as both
"sqrt_plot.png"and"sqrt_plot.pdf". - Then practice saving a pandas plot using two different approaches:
- Approach 1: Create a
fig, axpair withplt.subplots(), passaxtodf.plot(ax=ax), then save viafig.savefig(). - Approach 2: Don’t create a
figexplicitly — capture the Axes returned byax = df.plot(), retrieve the Figure withax.get_figure(), then save.
- Approach 1: Create a
Approach 1 (Pandas): Pass an Axes object to df.plot() so you already hold the fig handle:
Approach 2 (Pandas): Let df.plot() return the Axes, then retrieve the Figure from it:
Exercise 3: Index-based Scatter Plot with Pandas
Objective: Learn how to create scatter-like plots in pandas when the x-axis should be the DataFrame index.
df.plot.scatter() requires two column names and cannot use the index as the x-axis. To produce a scatter-like appearance (markers, no connecting line) against the index, use df.plot.line() with two key parameters:
marker=— any marker symbol, e.g."o","x","."linestyle="None"(or" "or"") — suppresses the connecting line
Task:
- Using the weather DataFrame provided below, plot the
"temperature"column as a scatter-like plot: circle markers ("o"), no connecting line. - In the second code cell, create the same plot without
linestyle="None"and observe what changes.
Exercise 4: Dual Y-axis Plots with twinx()
Objective: Create a figure that displays two datasets with different scales using two y-axes.
When two quantities have very different units or magnitudes, you can use ax.twinx() to create a second Axes that shares the x-axis but has an independent y-axis on the right side. Match the axis label and tick colours to the corresponding dataset so the figure is easy to read.
The default matplotlib cycle colours "C0", "C1", etc. are useful for visually separating the two datasets.
Task: Using the monthly climate data provided below, create a dual y-axis figure:
- Left y-axis: monthly average temperature (°C) plotted as a line, coloured
"C0" - Right y-axis: monthly rainfall (mm) plotted as a bar chart, coloured
"C1" - Colour each y-axis label and its tick labels to match the data series
- Set the x-tick labels to month names and add a title
Exercise 5: Revision of Random Number Generation with NumPy
Objective: Practice creating and using a NumPy random number generator.
The modern NumPy approach is to create a generator object first, optionally with a fixed seed so results are reproducible:
from numpy import random
rng = random.default_rng(seed=42)You can then call methods on rng to draw samples:
rng.integers(low, high)— a single random integer in[low, high)rng.integers(low, high, size=n)— an array ofnrandom integersrng.choice(seq)— a single random element from a sequencerng.choice(seq, size=k, replace=False)—kunique elements (no repeats)rng.shuffle(seq)— shuffle a list in-place
Tasks:
- Create a generator with
seed=99. - Generate an array of 10 random integers between 1 and 100 (inclusive of 1, exclusive of 100).
- From the list
["rock", "paper", "scissors"], simulate 5 random choices with replacement. - Shuffle the list
[1, 2, 3, 4, 5]in-place and print it before and after.
Exercise 6: DatetimeIndex and Filtering by Month
Objective: Work with a pandas DatetimeIndex to filter rows by a specific month.
When a DataFrame has a DatetimeIndex, you can access time components via .index.month, .index.year, .index.day, etc. These return integer arrays you can use as boolean filters.
Tasks — using the DataFrame provided below:
- Confirm the index is a
DatetimeIndexby printingdf.index. - Extract only the rows from March (month 3) using
.index.month. - Calculate the mean value for each month across all years. (Hint: group by
df.index.monthusingdf.groupby(df.index.month).mean().)
Exercise 7: NaN Propagation
Objective: Understand how NaN values behave in arithmetic operations on pandas Series.
Key rules:
NaN + number→NaN(NaN is “viral” in arithmetic)- Pandas aggregation functions (
.mean(),.sum(), etc.) skip NaN by default - You can fill NaN values before operating with
df["col"].fillna(value)
Tasks — using the DataFrame provided below:
- Add the two columns
"a"and"b"together. Observe what happens where one value isNaN. - Add 100 to column
"b". What happens to the NaN entries? - Compute the mean of
"b"— does pandas skip the NaN values? - Fill the NaN values in
"b"with0first, then add 100. How does the result differ?
Exercise 8: Finding a Minimum with scipy.optimize
Objective: Use scipy.optimize.minimize_scalar to find the minimum of a function, and restrict the search to a specific interval using bounds.
The lecture showed minimising \(f(x) = x^2\) — a simple case where the minimum is at 0. In practice you often need to find a minimum inside a specific range. Passing method="bounded" and bounds=(a, b) restricts the search to \([a, b]\):
from scipy.optimize import minimize_scalar
result = minimize_scalar(f, bounds=(0, 10), method="bounded")
print(result.x, result.fun)Tasks:
- Define the function \(f(x) = \sin(x) + 0.1 \, x^2\).
- Use
minimize_scalarwithbounds=(0, 10)andmethod="bounded"to find its minimum in \([0, 10]\). - Plot \(f(x)\) over \([0, 10]\) and mark the minimum with a red dot. Label the axes and add a legend.
Exercise 9: Sampling from Distributions with scipy.stats
Objective: Use scipy.stats distribution objects to draw random samples and plot theoretical probability density functions (PDFs).
scipy.stats provides objects for many statistical distributions. Each object lets you:
.rvs(size=n, random_state=rng)— drawnrandom samples.pdf(x)— evaluate the theoretical probability density at each point inx.mean(),.std()— get the theoretical mean and standard deviation
For example, a normal distribution with mean 5 and standard deviation 2:
from scipy import stats
dist = stats.norm(loc=5, scale=2)
samples = dist.rvs(size=200, random_state=42)Tasks:
Create a normal distribution with mean 10 and standard deviation 3. Draw 500 samples from it.
Plot a normalised histogram (
density=True) of the samples.On the same axes, overlay the theoretical PDF curve using
.pdf()evaluated over a suitable range ofxvalues.Print the theoretical mean and standard deviation using
.mean()and.std(), and compare them tosamples.mean()andsamples.std().