With the changing climate, California is witnessing more pronounced wildfires with every passing year. The 10 largest fires in California since 1932, have occurred since 2000. 2021 continued to be yet another year of intense and frequent wildfires for the United States. For this analysis, we will explore the California and Oregon fires using Blue Sky’s Zuri datasets.
Introducing Shape Explorer
Before we can query anything from the Blue Sky APIs, we need the shape IDs for the relevant regions. Hence, we are introducing Shape Explorer that can be found in SpaceTime. The Shape Explorer can be accessed from the More dropdown as seen at the right corner!
The goal behind Shape Explorer is to enable the use of consistent shape IDs which are unique and represent a specific boundary at different administrative levels. It will help in avoiding any non-uniformity that might arise due to the names of places which may vary from person to person making the system more robust & objective.
This tutorial assumes that the user has Python and Jupyter Notebook installed. For detailed instructions on installing or setting up Python or Jupyter Notebook, please refer to our previous tutorial.
Setting up the environment
Fire up Jupyter Notebook in the terminal by typing jupyter notebook
. Note: As we will download more data, Jupyter Notebook
may throw
💡
IOPub data rate exceeded error
So, it is better to run the command below from your terminal right at the start!
💡
jupyter notebook --NotebookApp.iopub_data_rate_limit=1.0e10
Create a new Python 3
notebook and run the following commands in jupyter notebook
pip install requests
pip install pandas
pip install json
pip install plotly
pip install seaborn
To ensure all the plots from this analysis are correctly reproduced the following packages will be required.
# This package helps you reach out to the API
import requests
# This package is necessary to ensure you receive data in a JSON format
import json
# Required to reproduce the plots included in this tutorial
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import calendar
from matplotlib.dates import DateFormatter
%matplotlib inline
Fetching the data
After the packages have been imported, the next step lies in determining the endpoint which in turn will fetch the GHG Emissions data and this can be found in Dev Portal docs.
https://gateway.blueskyhq.io/api/zuri/fires?api-key={INSERT YOUR KEY HERE}
&shapeId={shapeId}
&startDate={startDate}
&endDate={endDate}
&raw={raw}
&keys={keys}
California and Oregon fires from 2018 to 2021
params = {
"api-key": "#INSERT YOUR KEY HERE",
"shapeId": "1cca7a4d-7f32-4de8-a5d1-23281bf21753", #INSERT SHAPE ID OF INTEREST
"startDate": "2018-01-01T00:00:00.000Z",
"endDate": "2021-12-31T00:00:00.000Z",
"raw": "false",
"adminLevelBucket":"3", # "3" will fetch the data for California
"keys": "frp,co2_emission_estimate",
}
The above parameters will fetch frp
(average), co2_emission_estimate
data of fires for the Shape ID (here California ~ 1cca7a4d-7f32-4de8-a5d1-23281bf21753
) between the 1st of January 2019 and the 31st of December 2021. By default, it uses VIIRS data and we will continue this tutorial with the same.
adminLevelBucket
helps you bucket the data spatially. Eg, if we have to find GHG emissions for all the states in the US, we would use shapeID
of USA & adminLevelBucket
equal to 3. The API would then return the data, spatially bucketed for each state in the US. The table below helps explain how to use the adminLevelBucket
parameter.
Levels | Admin Region |
---|---|
0 | World |
1 | Continent |
2 | Country |
3 | State |
4 | District/County |
5 | Electoral/Sub-District |
california_data = requests.get("https://gateway.blueskyhq.in/api/zuri/fires?", params = params_california).json()
california = pd.DataFrame(california_data['data'])
Similarly, fire emissions data can be downloaded for Oregon using Oregon’s Shape ID.
# Combining the two datasets to create a new one
combined = pd.concat([california, oregon])
sns.set()
sns.barplot(x = 'year', y = 'fires', hue = 'States', data = combined, estimator= sum)
plt.xlabel('Year')
plt.ylabel('Fires')
plt.title('California and Oregon fires between 2018 and 2021')
plt.tight_layout()
plt.show()
sns.set()
sns.barplot(x = 'year', y = 'co2_emission_estimate', hue = 'States', data = combined, estimator = sum)
plt.yticks([50000000, 100000000, 150000000, 200000000, 250000000, 300000000],['50M','100M','150M', '200M', '250M', '300M'])
plt.xlabel('Year')
plt.ylabel('$CO_{2}$ Emissions Estimate (in tonnes)')
plt.title('California and Oregon $CO_{2}$ emissions (in tonnes) from fires \n between 2018 and 2021')
plt.tight_layout()
plt.show()
GHG Emissions from California and Oregon for 2021
For the rest of the tutorial, we will focus on analysing the 2021 fires in California and Oregon.
Let’s get 2021 fires data for California and Oregon from our combined
dataframe.
comb_2021 = combined[combined["year"]==2021]
comb_2021 = comb_2021.groupby(["States", "month"]).agg({
"fires": "sum",
"co2_emission_estimate": "sum"
}).reset_index()
Creating barplots using seaborn
to visualize month-wise distribution of 2021 fires and emissions in California and Oregon.
sns.set()
sns.barplot(x = 'month', y = 'fires', hue = 'States', data = comb_2021, estimator= sum)
plt.xlabel('Month')
plt.ylabel('Fires')
plt.title('California and Oregon fires in 2021')
plt.tight_layout()
plt.show()
sns.set()
sns.barplot(x = 'month', y = 'co2_emission_estimate', hue = 'States', data = comb_2021, estimator= sum)
plt.yticks([25000000,50000000, 75000000, 100000000, 125000000, 150000000, 175000000],['25M','50M','75M','100M','125M', '150M', '175M'])
plt.xlabel('Month')
plt.ylabel('$CO_{2}$ Emissions Estimate (in tonnes)')
plt.title('California and Oregon $CO_{2}$ emissions (in tonnes) from fires \n in 2021')
plt.show()
Further Analysis for the United States of America
Fires in the United States between 2018 to 2021
The procedure for getting the data using Shape ID remains the same as we did for California and Oregon.
usa['fires'] = pd.to_numeric(usa['fires'])
usa['co2_emission_estimate'] = pd.to_numeric(usa['co2_emission_estimate'])
usa["year"] = pd.to_datetime(usa["datetime"]).dt.year
usa['date'] = pd.to_datetime(usa["datetime"]).dt.date
usa.index = pd.to_datetime(usa.index)
usa_18_21 = usa.groupby("year").agg({"fires": "sum", "co2_emissions_estimate": "sum"}).reset_index()
💡 We are looking forward to including
timeBucket
as a parameter as it will enable getting the data inmonthly
,yearly
and many more temporal formats!
Let’s create a simple line plot using plotly
that will help us visualize how the fires have evolved from 2018 till 2021. Let’s first get the data for the 4 years of interest!
fig = go.Figure(go.Scatter(
x = usa_18_21["year"],
y = usa_18_21["fires"], ))
fig.update_layout(
xaxis = dict(
tickmode = 'array',
tickvals = [2018, 2019, 2020, 2021],
ticktext = ['2018', '2019', '2020', '2021']
)
)
fig.update_layout(
title="Fires in USA from 2018 - 2021",
xaxis_title="Year",
yaxis_title="Fire Count",
font=dict(
family="Times New Roman"
)
)
fig.show()
Similarly, we can visualize the evolution of CO2 emissions in USA.
Fires distribution for all the US states
params_states = {
"api-key": "#INSERT YOUR KEY HERE",
"shapeId": "b96447fb-f7af-4bfa-89d7-13b84eb18b87", #INSERT SHAPE ID OF INTEREST
"startDate": "2018-01-01T00:00:00.000Z",
"endDate": "2021-12-31T00:00:00.000Z",
"raw": "false",
"adminLevelBucket":"3", ## 3 will fetch the data for all states
"keys": "co2_emission_estimate",
}
usastates = requests.get("https://gateway.blueskyhq.in/api/zuri/fires?", params = params_states).json()
print(json.dumps(usastates, indent=2))
us_states = pd.DataFrame(usastates['data'])
Let us now see which are the states that contribute the most to GHG emissions using a query filter co2_emission_estimate
greater than 5 million tonnes.
fig2 = px.scatter(us_states.query("co2_emission_estimate >= 5000000"), x="fires", y="co2_emission_estimate",
size="fires", color="name",
hover_name="name", log_x=True, size_max=60, title="Top 2 US States contributing to GHG emissions from Fires from 2018 - 2021")
fig2.update_layout(
font_family="Times New Roman"
)
fig2.show()
From the figure above, we can conclude that the states that contributed most to the GHG emissions between 2018 - 2021 were California, Oregon, Alaska and Colorado.
Key Takeaways
-
In the USA, the total number of fires has increased by 20% and the CO2 emissions by 28% in 2021 since 2018.
-
California and Oregon contribute enormously to the GHG emissions resulting from fires. In 2021, the USA emitted 423.6 million tonnes of CO2 of which 225 million tonnes; around 53%; of emissions were from California and Oregon only.
-
In 2021, California and Oregon combined witnessed almost 250K fires resulting in a whopping 250M tonnes of CO2 emissions.
-
Most fires were seen from July to September when the temperature is already very high.