Imagine this. You live in Patna. With winter arriving, you notice pollution levels seem to go up in the city. It is a hunch but you do not know if it actually does because there are no Air Quality Monitors around your neighbourhood. You want to find out how bad air pollution is right now in the city and see if it warrants more attention. How do you find this out?

In this tutorial, you will learn how to find PM 2.5 data from our Developer Portal and use it.

Prerequisites

They are as follows:

  • Python

  • Jupyter Notebook

  • Knowing how to use our Developer Portal

If you need help with any of these, you can refer to this tutorial, where we have explained all of this in detail.

Introducing SpatialAQ

If you live in India, you know air pollution is a big problem in parts of the country. However, what you probably do not know is how bad it is. The reason is quite simple - as a country, we lack sufficient ground monitors to truly understand the situation. (Read here for more details)

To address this gap, we developed the SpatialAQ - an air quality dataset generated in-house using satellite data and proprietary ML models that help us generate high-resolution air quality data for all of India!

PM2.5 levels across India using SpatialAQ on SpaceTime

PM2.5 levels across India using SpatialAQ on SpaceTime

Setting up API calls

Start your Jupyter notebook. You do this by going to the Terminal and typing jupyter notebook

  • Once the notebook opens up, you start by importing a few python packages:

    # This package helps us reach out to the API
    import requests 
    # This package is necessary to ensure you receive data in a JSON format
    import json 
    
  • Once the packages are imported, review the documentation to determine the endpoint for getting SpatialAQ data. According to our documentation, the data you are looking for is at the below-mentioned endpoint.

    https://gateway.blueskyhq.in/api/breezo/spatial
    ?api-key={INSERT YOUR KEY HERE}
    &product={product}
    &shapeId={shapeId}
    &region={region}
    &regionType={regionType}
    &duration={duration}
    
  • Enter the following parameters in the endpoint.

    mydict={
      'api-key': '{INSERT YOUR KEY HERE',
      'product': 'pm25',
      'region': 'patna',
      'regionType': 'district',
      'duration': '1d'
    }mydict={
      'api-key': '{INSERT YOUR KEY HERE',
      'product': 'pm25',
      'region': 'patna',
      'regionType': 'district',
      'duration': '1d'
    }mydict={
      'api-key': '{INSERT YOUR KEY HERE',
      'product': 'pm25',
      'region': 'patna',
      'regionType': 'district',
      'duration': '1d'
    }mydict={
      'api-key': '{INSERT YOUR KEY HERE',
      'product': 'pm25',
      'region': 'patna',
      'regionType': 'district',
      'duration': '1d'
    }
    
  • Now make your API call

    #Making an API call and displaying the output.
    request=requests.get("https://gateway.blueskyhq.in/api/breezo/spatial", params = mydict).json()
    print(json.dumps(request, indent=2))
    
  • You will see the following output. This is the result of the API call you just made.

    HTTP/1.1 200 OK
    Content-Type: application/json
    
    {
      "data": [
        {
          "datetime": "2022-01-24T00:00:00.000Z",
          "pm25": 103.87614592320966
        }
      ],
      "meta": {
        "duration": "1d",
        "region": "patna",
        "regionType": "district"
      }
    }
    
    

If you receive the data in the format above, it means you have been successful in passing the appropriate parameters. You get the PM2.5 values of pollution for today. Do you want to expand the scope of the data? Then, read on.

Making the API call for one month of air quality data

Let us repeat the above process to get data for Patna for a month.

#You feed the parameters to the endpoint and request the data gods to give us this
mydict={
  'api-key': 'INSERT YOUR KEY HERE',
  'product': 'pm25',
  'region': 'patna',
  'regionType': 'district',
  'duration': '1d'
}

patna_request1m=requests.get(endpoint,params=mydict).json()
print(json.dumps(patna_request1m, indent=2))

Notice, we have replaced 1d with 1m which represents a duration of 1 month. The above code returns the data for PM2.5 levels for a whole month for Patna.

{
  "data": [
   {
      "datetime": "2021-12-31T00:00:00.000Z",
      "pm25": 95.47887185534591
    },
    {
      "datetime": "2022-01-01T00:00:00.000Z",
      "pm25": 94.78063853948288
    },
    ...
		...
		...
		{
      "datetime": "2022-01-29T00:00:00.000Z",
      "pm25": 117.39931647449336
    }
  ],
  "meta": {
    "duration": "1m",
    "region": "patna",
    "regionType": "district"
  }
}

Analyzing the air quality data

The above output is in JSON. You can do basic analysis by converting it from JSON format to a table. For this, you will need to import another python package called pandas.

import pandas as pd

You will now store the above JSON data in a pandas dataframe/table.

df= pd.DataFrame(patnarequest1m['data'])

Once this data is loaded onto a pandas dataframe, you can query it to find insights, as demonstrated below.

  • Worst and best days for pollution in the above duration

    # This code tells us what the highest value was of PM 2.5
    df.loc[df['pm25'].idxmax()]
    
    datetime    2022-01-06T00:00:00.000Z
    pm25                      206.682041
    Name: 6, dtype: object
    
  • Cleanest air day

    #This code tells us what the highest value was of PM 2.5
    df.loc[df['pm25'].idxmin()]
    
    datetime    2022-01-02T00:00:00.000Z
    pm25                       93.053546
    Name: 2, dtype: object
    
  • Average pollution level in this duration

    df['pm25'].mean()
    128.4385918937806
    

Plotting the data

Summary statistics can give us a good insight from the data, but sometimes graphs can do a better job. You will need to follow the following steps to generate a graph.

  • Import the following packages

    import matplotlib.pyplot as plt 
    #This is a graphics package we're using to make fun charts
    import matplotlib.dates as mdates 
    #This is a small package we're importing to make dates look nice
    %matplotlib inline
    
  • Send the above data frame to the matplotlib library to plot a chart.

    fig, ax = plt.subplots(figsize=(15,7))
    df['pm25'].plot(ax=ax)
    ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %d'))
    

This nifty piece of code quickly generates a chart showing how PM2.5 levels have varied in this duration.

PM2.5 levels for Patna in the month of January 2022

PM2.5 levels for Patna in the month of January 2022

As you have noticed, you were able to get data for the city of Patna in fewer than ten lines of code. You can extend this analysis to other regions and other durations with the same amount of code.