Deforestation Data
We use the Our World in Data deforestation dataset.
It contains data on forest area and deforestation data for various countries over time.
Its link is
Exercise 1: Loading and pivoting the data
Read in the dataset using pandas.
Then, pivot the data so that you have years as the index, countries as the columns, and deforestation as the values.
Exercise 2 : Plotting deforestation for regions
The datasets contains rates in ha/year (hectares per year) for many different countries, as well as the various regions including the world as a whole.
Find the names of such regions in the dataset and create a plot with multiple lines, one for each region, showing deforestation over time. Include also the world as a whole.
Plot requirements:
- The plot should have appropriate labels and a legend.
- The lines should also have markers.
- The lines should be distinguishable by color and line style.
- The line for the World should be thicker than the others.
- The deforestation values should be shown in millions of hectares per year (i.e., divide the values by 1,000,000).
Notes. - If you look carefully at the original dataset, you will notice that you have a Code column that contains codes for countries and regions. - You can provide a list of styles to the style argument of the plot method to have different line styles for each line. - If ax is an axis returned by a plot object, you can access its last plotted line as ax.lines[-1] and sets its properties with methods such as set_linewidth() or set_color().
Exercise 4: Stacked area plot of regional deforestation
Create a stacked area plot showing how deforestation is distributed across continents over time (1990-2020).
Requirements: - Use only the continental regions (Africa, Asia, Europe, North America, Oceania, South America) - Show values in millions of hectares per year - Only include positive values (deforestation, not reforestation) - Add appropriate title and labels
Hint: Use kind='area' and set stacked=True.
Exercise 5: Bar plot of top 10 countries with highest deforestation in 2010
Using the data for 2010, create a bar plot showing the top 10 countries with the highest deforestation rates in absolute terms.
Plot them so that they are sorted (ascending or descending, as you prefer).
You need to exclude regions and only consider countries.
Note. - Boolean operators can be used to filter dataframes. For example, if we want to filter a dataframe to include only rows where the column A doe not contain the value foo, we can do:
filtered_df = df[~(df['A'] == 'foo')]where the ~ operator negates the boolean condition. - You can use barh as the kind of plot to create horizontal bar plots.
Exercise 6: Merging datasets and normalisation by area
The comparison above is in part unfair because countries have different sizes (and hence extent of forested surfaces).
As a first approximation, we can normalise deforestation rates by the total area of each country.
To retrieve the total area data for countries, you’ll need to load a CSV file containing country areas. We can use the Our World in Data dataset on land area.
Here’s the link to the dataset:
https://ourworldindata.org/grapher/land-area-km.csv
We can read this second dataset into a separate dataframe.
A powerful feature of pandas is the ability to merge dataframes based on common columns.
This is done using the pd.merge function as follows:
merged_df = pd.merge(df1, df2, on=list_of_common_columns)Using merging, combine the deforestation data for 2010 with the land area data.
Then, create a new column in the merged dataframe that contains the deforestation rate per square kilometer for each country.
Finally, create a bar plot showing the top 10 countries with the highest deforestation rates per square kilometer in 2010.
Notes.
- When merging, provide a list of common columns to the
onargument. In this case, the common columns are likely to beEntity,YearandCode.
Exercise 7: Bringing vegetable oil data in
We can further extend our analysis by bringing in data on vegetable oil production, which is a significant driver of deforestation in some regions.
First, we load the vegetable oil production dataset from Our World in Data, which contains production data for various types of oils (palm oil, soybean oil, sunflower oil, etc.).
Since deforestation is driven by various oil crops (not just palm oil), sum all vegetable oil production types to get a total for each country and year and store it as a new column. This provides a more comprehensive view of agricultural oil pressure.
Exercise 8: Pivot by country the vegetable oil data
To get a by country view of vegetable oil production, pivot the vegetable oil dataset so that you have years as the index, countries as the columns, and total vegetable oil production as the values.
Exercise 9: Select countries for comparison
Filter to include only countries that appear in both the top 10 deforestation list (from 2010, Exercise 5) AND have oil production data.
Exercise 10: Plot vegetable oil production over time for selected countries
We normalize all oil production values relative to 1990 (baseline = 100) to make it easier to compare growth rates across countries of different sizes.
Produce two plots:
- first the normalised vegetable oil production over time for the selected countries
- then a smoothened version of the same plot (using a rolling mean with a window of 10 years).
Exercise 11: Comparing deforestation and oil production over time
For each of the following countries (Brazil, the Democratic Republic of Congo, India and Indonesia) plot a figure with the two lines:
- the normalised vegetable oil production over time (use the smoothed version from Exercise 10)
- the deforestation rate over time (also normalised to 1990)
Both lines should be on the same plot with the same y-axis scale (since both use 1990 = 100 as baseline).
Note: You may observe inverse correlations in some countries - this is expected and scientifically meaningful (see interpretation below).
Hints.
- You may need to drop
NaNvalues with thedropna()method to get continuous lines - You need first to normalise your deforestation data
- You may want to create a new dataframe using pd.DataFrame and a suitable dictionary
Interpretation of Results
You should observe different trends depending on the countries of consideration that can reflect different modes of production, deforestation and shifts in policies:
- Some countries show correlations between deforestation rates and vegetable oil production: this is typically when vegetable oil production requires direct area increases and deforestation is diffuse (small-scale clearing of land)
- Other countries can show inverse correlation: this can have various causes, for example time-lags between extensive deforestation and usage of the land, as well as different means of expanding vegetable oil production (conversion of already cleared land, e.g. pasture, into crops)
This exercise demonstrates that macroscale data analysis can be rather complex and requires multiple layers of information and datasets to be properly interpreted. In this specific case, we would need more information on the modes of production, land use changes, policies and economic drivers to make sense of the observed trends.