Community Articles
Find and share helpful community-sourced technical articles.
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.
Labels (1)
Super Guru

Tracking Air Quality with HDP and HDF: Part 1 - Apache NiFi Ingest

Part 2: Plan Data Storage. Store to Apache Hive, Apache Druid and Apache HBase.

Part 3: Query and Visualize Data with Apache Zeppelin and Superset


There was an Air Quality alert a few days ago near me and I was curious how I could keep track of this important environmental information. So NiFi! This data is different from weather data, but makes a lot of sense for analytics to add in data from Weather, Social and locally captured cameras. It's very easy to ingest these JSON and Camera Images via Apache NiFi. In the next section we will analyze the datasets and determine how we can aggregate and accumulate massive quantities of this data for tracking air quality in various areas over time and use that as a dimension with other relevant data like weather.

We are tracking contaminants and particles in the air.

These include:

  • pm25, pm10 - atmospheric particulate matter
  • so2 - sulfur dioxide
  • no2 - nitrogen dioxide
  • o3 - ozone
  • co - carbon monoxide

Photos Courtesy of HazeCam - Brigantine, NJ



Example Data

{"location":"ARB OER","city":"CA8 - ARB","country":"US","distance":3848728.319714322,"measurements":[{"parameter":"pm25","value":-4,"lastUpdated":"2016-08-08T16:00:00.000Z","unit":"µg/m³","sourceName":"AirNow","averagingPeriod":{"value":1,"unit":"hours"}}],"coordinates":{"latitude":38.568504,"longitude":-121.493256}}

  "location" : "MONTG",
  "parameter" : "o3",
  "date" : {
    "utc" : "2018-05-05T12:00:00.000Z",
    "local" : "2018-05-05T06:00:00-06:00"
  "value" : 0.004,
  "unit" : "ppm",
  "coordinates" : {
    "latitude" : 32.4069,
    "longitude" : -86.2564
  "country" : "US",
  "city" : "Montgomery"

Most of the data is arrays of JSON, so we can easily break that down into individual JSON records, derive an AVRO Schema from that data and then process it as we want. We can join them together and then convert into ORC files or HBase rows.







Data Feed Links

Haze Cam Provides Web Camera Images of Potential Haze

OpenAQ ( Provides Open Air Quality Data

Air NOW API (Provides forecasts and current conditions)

EPA's Air Quality Notifications

Other Sources

Don't have an account?
Version history
Last update:
‎08-17-2019 07:36 AM
Updated by:
Top Kudoed Authors