Deforestation Data

We use the Our World in Data deforestation dataset.

It contains data on forest area and deforestation data for various countries over time.

Its link is

https://ourworldindata.org/grapher/annual-deforestation.csv?v=1&csvType=full&useColumnShortNames=false

Exercise 1: Loading and pivoting the data

Read in the dataset using pandas.

Then, pivot the data so that you have years as the index, countries as the columns, and deforestation as the values.

Exercise 2 : Plotting deforestation for regions

The datasets contains rates in ha/year (hectares per year) for many different countries, as well as the various regions including the world as a whole.

Find the names of such geographical regions in the dataset and create a plot with multiple lines, one for each region, showing deforestation over time. Include also the world as a whole.

Plot requirements:

The plot should have appropriate labels and a legend.
The lines should also have markers.
The lines should be distinguishable by color and line style.
The line for the World should be thicker than the others.
The deforestation values should be shown in millions of hectares per year (i.e., divide the values by 1,000,000).

Notes. - If you look carefully at the original dataset, you will notice that you have a Code column that contains codes for countries and regions. - You can provide a list of styles to the style argument of the plot method to have different line styles for each line. - If ax is an axis returned by a plot object, you can access its last plotted line as ax.lines[-1] and sets its properties with methods such as set_linewidth() or set_color().

Exercise 4: Stacked area plot of regional deforestation

Create a stacked area plot showing how deforestation is distributed across continents over time (1990-2020).

Requirements: - Use only the continental regions (Africa, Asia, Europe, North America, Oceania, South America) - Show values in millions of hectares per year - Only include positive values (deforestation, not reforestation) - Add appropriate title and labels

Hint: Use kind='area' and set stacked=True.

Exercise 5: Bar plot of top 10 countries with highest deforestation in 2010

Using the data for 2010, create a bar plot showing the top 10 countries with the highest deforestation rates in absolute terms.

Plot them so that they are sorted (ascending or descending, as you prefer).

You need to exclude regions and only consider countries.

Note. - Boolean operators can be used to filter dataframes. For example, if we want to filter a dataframe to include only rows where the column A doe not contain the value foo, we can do:

filtered_df = df[~(df['A'] == 'foo')]

where the ~ operator negates the boolean condition. - You can use barh as the kind of plot to create horizontal bar plots.

Exercise 6: Merging datasets and normalisation by area

The comparison above is in part unfair because countries have different sizes (and hence extent of forested surfaces).

As a first approximation, we can normalise deforestation rates by the total area of each country.

To retrieve the total area data for countries, you’ll need to load a CSV file containing country areas. We can use the Our World in Data dataset on land area.

Here’s the link to the dataset:

https://ourworldindata.org/grapher/land-area-km.csv

We can read this second dataset into a separate dataframe.

A powerful feature of pandas is the ability to merge dataframes based on common columns.

This is done using the pd.merge function as follows:

merged_df = pd.merge(df1, df2, on=list_of_common_columns)

Using merging, combine the deforestation data for 2010 with the land area data.

Then, create a new column in the merged dataframe that contains the deforestation rate per square kilometer for each country.

Finally, create a bar plot showing the top 10 countries with the highest deforestation rates per square kilometer in 2010.

Notes.

When merging, provide a list of common columns to the on argument. In this case, the common columns are likely to be Entity, Year and Code.

Exercise 7: Bringing vegetable oil data in

We can further extend our analysis by bringing in data on vegetable oil production, which is a significant driver of deforestation in some regions.

First, we load the vegetable oil production dataset from Our World in Data, which contains production data for various types of oils (palm oil, soybean oil, sunflower oil, etc.).

Since deforestation is driven by various oil crops (not just palm oil), sum all vegetable oil production types to get a total for each country and year and store it as a new column. This provides a more comprehensive view of agricultural oil pressure.

Exercise 8: Pivot by country the vegetable oil data

To get a by country view of vegetable oil production, pivot the vegetable oil dataset so that you have years as the index, countries as the columns, and total vegetable oil production as the values.

Exercise 9: Select countries for comparison

Filter to include only countries that appear in both the top 10 deforestation list (from 2010, Exercise 5) AND have oil production data.

Exercise 10: Plot vegetable oil production over time for selected countries

We normalize all oil production values relative to 1990 (baseline = 100) to make it easier to compare growth rates across countries of different sizes.

Produce two plots:

first the normalised vegetable oil production over time for the selected countries
then a smoothened version of the same plot (using a rolling mean with a window of 10 years).

Exercise 11: Comparing deforestation and oil production over time

For each of the following countries (Brazil, the Democratic Republic of Congo, India and Indonesia) plot a figure with the two lines:

the normalised vegetable oil production over time (use the smoothed version from Exercise 10)
the deforestation rate over time (also normalised to 1990)

Both lines should be on the same plot with the same y-axis scale (since both use 1990 = 100 as baseline).

Note: You may observe inverse correlations in some countries - this is expected and scientifically meaningful (see interpretation below).

Hints.

You may need to drop NaN values with the dropna() method to get continuous lines
You need first to normalise your deforestation data
You may want to create a new dataframe using pd.DataFrame and a suitable dictionary

Interpretation of Results

You should observe different trends depending on the countries of consideration that can reflect different modes of production, deforestation and shifts in policies:

Some countries show correlations between deforestation rates and vegetable oil production: this is typically when vegetable oil production requires direct area increases and deforestation is diffuse (small-scale clearing of land)
Other countries can show inverse correlation: this can have various causes, for example time-lags between extensive deforestation and usage of the land, as well as different means of expanding vegetable oil production (conversion of already cleared land, e.g. pasture, into crops)

This exercise demonstrates that macroscale data analysis can be rather complex and requires multiple layers of information and datasets to be properly interpreted. In this specific case, we would need more information on the modes of production, land use changes, policies and economic drivers to make sense of the observed trends.

--- title: Deforestation Data jupyter: python3 --- We use the [Our World in Data deforestation dataset](https://ourworldindata.org/forests-and-deforestation). It contains data on forest area and deforestation data for various countries over time. Its link is [https://ourworldindata.org/grapher/annual-deforestation.csv?v=1&csvType=full&useColumnShortNames=false](https://ourworldindata.org/grapher/annual-deforestation.csv?v=1&csvType=full&useColumnShortNames=false) ## Exercise 1: Loading and pivoting the data Read in the dataset using `pandas`. Then, pivot the data so that you have years as the index, countries as the columns, and deforestation as the values. ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Exercise 2 : Plotting deforestation for regions The datasets contains rates in ha/year (hectares per year) for many different countries, as well as the various regions including the world as a whole. Find the names of such geographical regions in the dataset and create a plot with multiple lines, one for each region, showing deforestation over time. Include also the world as a whole. **Plot requirements:** - The plot should have appropriate labels and a legend. - The lines should also have markers. - The lines should be distinguishable by color and line style. - The line for the World should be thicker than the others. - The deforestation values should be shown in millions of hectares per year (i.e., divide the values by 1,000,000). **Notes.** - If you look carefully at the original dataset, you will notice that you have a `Code` column that contains codes for countries and regions. - You can provide a list of styles to the style argument of the `plot` method to have different line styles for each line. - If `ax` is an axis returned by a plot object, you can access its last plotted line as `ax.lines[-1]` and sets its properties with methods such as `set_linewidth()` or `set_color()`. ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Exercise 3 : Share of deforestation by continent in 2020 Pick now year 2020 and create a pie chart showing the share of deforestation for that year between the various continents (i.e., Africa, Asia, Europe, North America, Oceania, South America). The share is the fraction of the total deforestation in the world that is due to each continent. **Notes.** - You can improve the rendering of the pie chart by adding formatted percentage labels with the `autopct` to plot using the `pie` kind. ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Exercise 4: Stacked area plot of regional deforestation Create a stacked area plot showing how deforestation is distributed across continents over time (1990-2020). **Requirements:** - Use only the continental regions (Africa, Asia, Europe, North America, Oceania, South America) - Show values in millions of hectares per year - Only include positive values (deforestation, not reforestation) - Add appropriate title and labels **Hint:** Use `kind='area'` and set `stacked=True`. ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Exercise 5: Bar plot of top 10 countries with highest deforestation in 2010 Using the data for 2010, create a **bar plot** showing the top 10 countries with the highest deforestation rates in absolute terms. Plot them so that they are sorted (ascending or descending, as you prefer). You need to **exclude regions** and only consider countries. **Note.** - Boolean operators can be used to filter dataframes. For example, if we want to filter a dataframe to include only rows where the column `A` doe not contain the value `foo`, we can do: ```python filtered_df = df[~(df['A'] == 'foo')] ``` where the `~` operator negates the boolean condition. - You can use `barh` as the kind of plot to create horizontal bar plots. ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Exercise 6: Merging datasets and normalisation by area The comparison above is in part unfair because countries have different sizes (and hence extent of forested surfaces). As a first approximation, we can normalise deforestation rates by the total area of each country. To retrieve the total area data for countries, you'll need to load a CSV file containing country areas. We can use the Our World in Data dataset on land area. Here's the link to the dataset: [https://ourworldindata.org/grapher/land-area-km.csv](https://ourworldindata.org/grapher/land-area-km.csv) We can read this second dataset into a separate dataframe. A powerful feature of `pandas` is the ability to merge dataframes based on common columns. This is done using the `pd.merge` function as follows: ```python merged_df = pd.merge(df1, df2, on=list_of_common_columns) ``` Using merging, combine the deforestation data for 2010 with the land area data. Then, create a new column in the merged dataframe that contains the deforestation rate per square kilometer for each country. Finally, create a bar plot showing the top 10 countries with the highest deforestation rates per square kilometer in 2010. **Notes.** - When merging, provide a list of common columns to the `on` argument. In this case, the common columns are likely to be `Entity`, `Year` and `Code`. ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Exercise 7: Bringing vegetable oil data in We can further extend our analysis by bringing in data on vegetable oil production, which is a significant driver of deforestation in some regions. First, we load the vegetable oil production dataset from Our World in Data, which contains production data for various types of oils (palm oil, soybean oil, sunflower oil, etc.). ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` Since deforestation is driven by various oil crops (not just palm oil), **sum all vegetable oil production types to get a total for each country and year** and store it as a new column. This provides a more comprehensive view of agricultural oil pressure. ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Exercise 8: Pivot by country the vegetable oil data To get a by country view of vegetable oil production, pivot the vegetable oil dataset so that you have years as the index, countries as the columns, and total vegetable oil production as the values. ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Exercise 9: Select countries for comparison Filter to include only countries that appear in both the top 10 deforestation list (from 2010, Exercise 5) AND have oil production data. ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Exercise 10: Plot vegetable oil production over time for selected countries We normalize all oil production values relative to 1990 (baseline = 100) to make it easier to compare growth rates across countries of different sizes. Produce two plots: - first the normalised vegetable oil production over time for the selected countries - then a smoothened version of the same plot (using a rolling mean with a window of 10 years). ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Exercise 11: Comparing deforestation and oil production over time For each of the following countries (Brazil, the Democratic Republic of Congo, India and Indonesia) plot a figure with the two lines: - the normalised vegetable oil production over time (use the smoothed version from Exercise 10) - the deforestation rate over time (also normalised to 1990) Both lines should be on the same plot with the same y-axis scale (since both use 1990 = 100 as baseline). **Note:** You may observe inverse correlations in some countries - this is expected and scientifically meaningful (see interpretation below). **Hints.** - You may need to drop `NaN` values with the `dropna()` method to get continuous lines - You need first to normalise your deforestation data - You may want to create a new dataframe using pd.DataFrame and a suitable dictionary ```{pyodide} #| caption: "▶ Ctrl/Cmd+Enter | ⇥ Ctrl/Cmd+] | ⇤ Ctrl/Cmd+[" # YOUR CODE HERE ``` ## Interpretation of Results You should observe different trends depending on the countries of consideration that can reflect different modes of production, deforestation and shifts in policies: - Some countries show correlations between deforestation rates and vegetable oil production: this is typically when vegetable oil production requires direct area increases and deforestation is diffuse (small-scale clearing of land) - Other countries can show inverse correlation: this can have various causes, for example time-lags between extensive deforestation and usage of the land, as well as different means of expanding vegetable oil production (conversion of already cleared land, e.g. pasture, into crops) This exercise demonstrates that macroscale data analysis can be rather complex and requires multiple layers of information and datasets to be properly interpreted. In this specific case, we would need more information on the modes of production, land use changes, policies and economic drivers to make sense of the observed trends.