Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Super Guru

Tracking Air Quality with HDP and HDF: Part 1 - Apache NiFi Ingest

Part 2: Plan Data Storage. Store to Apache Hive, Apache Druid and Apache HBase.

Part 3: Query and Visualize Data with Apache Zeppelin and Superset

72610-airqualityflow.png

There was an Air Quality alert a few days ago near me and I was curious how I could keep track of this important environmental information. So NiFi! This data is different from weather data, but makes a lot of sense for analytics to add in data from Weather, Social and locally captured cameras. It's very easy to ingest these JSON and Camera Images via Apache NiFi. In the next section we will analyze the datasets and determine how we can aggregate and accumulate massive quantities of this data for tracking air quality in various areas over time and use that as a dimension with other relevant data like weather.

We are tracking contaminants and particles in the air.

These include:

  • pm25, pm10 - atmospheric particulate matter
  • so2 - sulfur dioxide
  • no2 - nitrogen dioxide
  • o3 - ozone
  • co - carbon monoxide

Photos Courtesy of HazeCam - Brigantine, NJ

72607-hazecam.jpg

72608-hazecam-1.jpg


Example Data

{"location":"ARB OER","city":"CA8 - ARB","country":"US","distance":3848728.319714322,"measurements":[{"parameter":"pm25","value":-4,"lastUpdated":"2016-08-08T16:00:00.000Z","unit":"µg/m³","sourceName":"AirNow","averagingPeriod":{"value":1,"unit":"hours"}}],"coordinates":{"latitude":38.568504,"longitude":-121.493256}}


{
  "location" : "MONTG",
  "parameter" : "o3",
  "date" : {
    "utc" : "2018-05-05T12:00:00.000Z",
    "local" : "2018-05-05T06:00:00-06:00"
  },
  "value" : 0.004,
  "unit" : "ppm",
  "coordinates" : {
    "latitude" : 32.4069,
    "longitude" : -86.2564
  },
  "country" : "US",
  "city" : "Montgomery"
}


Most of the data is arrays of JSON, so we can easily break that down into individual JSON records, derive an AVRO Schema from that data and then process it as we want. We can join them together and then convert into ORC files or HBase rows.


72609-pmvalue2.png

72611-enviroflashhtml.png

72612-pmvalue.png

72613-aqireport.png

72614-airreport.png

72615-feedsattributes.png


Data Feed Links



Haze Cam Provides Web Camera Images of Potential Haze

http://hazecam.net/images/main/brigantine_right.jpg


OpenAQ (https://openaq.org/#/?_k=7mfsz6) Provides Open Air Quality Data

https://api.openaq.org/v1/latest?country=US

https://api.openaq.org/v1/measurements?country=US&date_from=2018-05-04

Air NOW API (Provides forecasts and current conditions)

http://www.airnowapi.org/aq/observation/zipCode/current/?format=application/json&zipCode=08520&dista...

http://www.airnowapi.org/aq/forecast/zipCode/?format=application/json&zipCode=08520&date=2018-05-02&...


EPA's Air Quality Notifications

http://feeds.enviroflash.info/

https://www.airnow.gov/index.cfm?action=airnow.national

http://feeds.enviroflash.info/rss/realtime/445.xml


Other Sources

http://feeds.enviroflash.info/cap/aggregate.xml

https://docs.openaq.org/

871 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 07:36 AM
Updated by:
 
Contributors
Top Kudoed Authors