Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
Super Guru

Tracking Air Quality with HDP and HDF: Part 1 - Apache NiFi Ingest

Part 2: Plan Data Storage. Store to Apache Hive, Apache Druid and Apache HBase.

Part 3: Query and Visualize Data with Apache Zeppelin and Superset


There was an Air Quality alert a few days ago near me and I was curious how I could keep track of this important environmental information. So NiFi! This data is different from weather data, but makes a lot of sense for analytics to add in data from Weather, Social and locally captured cameras. It's very easy to ingest these JSON and Camera Images via Apache NiFi. In the next section we will analyze the datasets and determine how we can aggregate and accumulate massive quantities of this data for tracking air quality in various areas over time and use that as a dimension with other relevant data like weather.

We are tracking contaminants and particles in the air.

These include:

  • pm25, pm10 - atmospheric particulate matter
  • so2 - sulfur dioxide
  • no2 - nitrogen dioxide
  • o3 - ozone
  • co - carbon monoxide

Photos Courtesy of HazeCam - Brigantine, NJ



Example Data

{"location":"ARB OER","city":"CA8 - ARB","country":"US","distance":3848728.319714322,"measurements":[{"parameter":"pm25","value":-4,"lastUpdated":"2016-08-08T16:00:00.000Z","unit":"µg/m³","sourceName":"AirNow","averagingPeriod":{"value":1,"unit":"hours"}}],"coordinates":{"latitude":38.568504,"longitude":-121.493256}}

  "location" : "MONTG",
  "parameter" : "o3",
  "date" : {
    "utc" : "2018-05-05T12:00:00.000Z",
    "local" : "2018-05-05T06:00:00-06:00"
  "value" : 0.004,
  "unit" : "ppm",
  "coordinates" : {
    "latitude" : 32.4069,
    "longitude" : -86.2564
  "country" : "US",
  "city" : "Montgomery"

Most of the data is arrays of JSON, so we can easily break that down into individual JSON records, derive an AVRO Schema from that data and then process it as we want. We can join them together and then convert into ORC files or HBase rows.







Data Feed Links

Haze Cam Provides Web Camera Images of Potential Haze

OpenAQ ( Provides Open Air Quality Data

Air NOW API (Provides forecasts and current conditions)

EPA's Air Quality Notifications

Other Sources

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.
Version history
Last update:
‎08-17-2019 07:36 AM
Updated by:
Top Kudoed Authors