Time Series Visualisation Tutorial For Petroleum Engineers
As Petroleum Engineers we work with Time Series data. Most Python visualisaiton tutorials hare missing Time Series. Here we use Pandas and Matplotlib to explore global yearly Electricity demand.
COP28 is in full swing and Climate related news is the headline in many news outlets again. One particular report that got a lot of attention in the lead up to COP28 was an article called “How Electricity Is Changing, Country by Country” published by The New York Times. The article is a great example on how to use data visualisation for story telling. For a data visualisation wizard there is nothing more fun than finding a clean dataset. For us this dataset has the benefit of being TimeSeries, so our learning can be easily applied to rates, volumes and pressure versus time plots.
Below, as we explore the dataset together, I will show you how to create some basic plots and look into datasets. The Dataset is called “Yearly electricity data” and it has been put together by ember. You can find it here. Dataset description reads:
This dataset contains yearly electricity generation, capacity, emissions, import and demand data for over 200 geographies. Data is collected from multi-country datasets (EIA, Eurostat, BP, UN) as well as national sources (e.g China data from the National Bureau of Statistics).
I am using Python in an interactive mode (Jupyter in VSCode) for this tutorial. If you want to learn how to use Python before jumping into data visualisation, please read my first two articles here and here.
I start with Downloading the set, loading the main two libraries and reading the data into a Pandas DataFrame:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('yearly_full_release_long_format.csv')
a simple df.head() will help us to read the first 5 rows of the data and see what has been presented to us. This also help to see if the data is read the way we were expecting it:
This shows us the data consist of Area and Country code followed by the year that data was collected. Area can be country or continent. We also have columns for regions and various groups that the country can be member of.
If we pause here for a second, you can appreciate how powerful these metadatas are and how these handful of columns can help us to report the values on many different levels. Category, Subcategory, Variable, Unit and Value are all the available electricity generation, capacity, emissions, import and demand data. The last two columns are the change calculations and we are not going to use theme here.
Let’s start with some basic exploring. For example, I am interested to know what categories are listed under Subcategory column. A simple way to find this information is to find the unique values in the column, as below:
df['Subcategory'].unique()
>>> array(['Aggregate fuel', 'Fuel', 'Demand', 'Demand per capita', 'Total', 'Electricity imports', 'CO2 intensity'], dtype=object)
Another exploration method is if I want to filter my Subcategory to only shows Demand per capita. I can simple do it as below:
Exploring the data for specific countries are very simple. Just like above, I can see the data related to Australia as below:
In the next step, I am going to explore the Demand per capita and the total demand for each country further. Once I start showing the Demand per capita, it would be obvious why we need both these views to get a better understanding of the data. Since it is an exploratory work, I am going to only create a view to the data for plotting purposes. The two views are created as below:
df_demand_per_capita_MWh = df[(df['Subcategory']=='Demand per capita') & (df['Area type']=='Country')][['Year', 'Area', 'Value']].reset_index(drop=True)
df_demand_TWh = df[(df['Subcategory']=='Demand') & (df['Area type']=='Country')][['Year', 'Area', 'Value']].reset_index(drop=True)
Here using two conditions I am selecting Demand per capita (or Demand), while only selecting the Country values. From the resulting table I am selecting the Year, Area and the Value. I am also asking Pandas to reset the indexing to make it easier to wrangle this new set.
We can go ahead and check the head of the data again to make sure the outcome is as expected. Here I use a combination of head and tail to explore rows further down the table:
As you can see the name of my DataFrame carries the name of the table and the unit system. I prefer a naming like this The benefits of this naming technique is I can remove columns of units and name from the dataset, making it lighter and easier to wrangle with.
Now looking at this table we are having a long table which is not the easiest to slice for TimeSeries data. It would be much easier if each Country was a column by itself, with year being the index and values (demand per capita MWh) as the cell values. Pandas have an easy to use Pivot function to help us with that:
Now that we have the data in the shape we are happy with, we can start the plotting journey. Python has plenty of libraries for Visualisation. For this tutorial I chose Matplotlib as it is generally the one that everyone starts on.
For the demand, let’s start with plotting the World’s total demand. For this I sum the demand on each row, which would be the total demand for all the countries for each year. The code and the chart would look like this:
If my x axis is Year or Data, I generally prefer to not label is as it is obvious to the reader and only take valuable space from my plot area.
A quick look to the plot and the followings stands out to me:
The overall trend
The dip in electricity demand during COVID and GFC
The dip at 2022.
The overall growing trend, screams how the demand is soaring.
Covid and GFC dips are very interesting to me. This just shows how Electricity and by extension, energy, is so fundamental to our lives. Any changes in any aspect of our lives, shows itself in our Energy (Electricity) consumption.
For the dip at 2022, further exploration shows, the dip is not real. There is a lot of missing data in the year 2022, hence the same dataset next year should give us a more realistic picture for 2022.
So while simple plots can reveal interesting information, we should be careful to not read too much into any of the plots. We will see another example of this when we explore demand per capita.
Now that we saw the global demand, It would be interesting to see what countries have the highest demands. To make it easier to make conclusions, I sort the dataset based on the countries with highest demand and plot their demand. In this example, I am not assigning the same Y-axis range for all the countries, so that we can see their trends over the years. If I would have gave all the countries the same range based on the highest demand, the scale made it very difficult to understand any trend in most of countries out of the first handful.
This code yields 215 plots, one for each country. I will only include the first 20 as an example:
What we see here is, China is not only the top spot but also is going through a massive increase in terms of demand. While USA ,at 2nd spot, has a flat demand over the reported period, India, at the 3rd spot, is following Chinese trend. We can see Brazil(8th) and South Korea(10th) are another top 10 countries with increasing demand.
Chine and India however have huge populations, and this demand make sense if we are talking about >25% of earth’s population. So what if we plot the same chart but this time adjust it for population. Here is how the top 20 look like for demand per capita:
At least for me the first time I look at this, the results was very interesting and surprising. However, after a bit of think about it it started to make sense.
In this charts, Iceland’s demand is very high and it is twice as big as the next country. Before ruling it out as an outlier that needs elimination, I did a quick search to confirm the validity of the data. The source of this dataset is from Eurostat based on the methodology document. This article explains these numbers a bit more in detail.
Another interesting trend is the fact that many of the top 20 countries, are those on harsher climates (extreme cold or hot) without huge populations but heavy reliance on Air conditioning. Also, interestingly China in this list is in 77th spot and India is further down at 143th spot, below countries like Palestine(139), Cuba(125) and Argentina(85).
As you can see we just started to scratch the surface with this dataset. It is has already been a long tutorial and I may come back to it in another post.
In summary, in this tutorial we saw how to import a CSV dataset, do initial familiarisation with the data, do some basic arithmetic on the data and create some basic plots. If you spent more time with this dataset and have other insights in it, please for share in the comments, or simple reply with an email. Also, if you wish the Notebook carrying all this code, please message me and I will share them with you.