- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 05-05-2018 01:51 PM - edited 08-17-2019 07:36 AM
Tracking Air Quality with HDP and HDF: Part 1 - Apache NiFi Ingest
Part 2: Plan Data Storage. Store to Apache Hive, Apache Druid and Apache HBase.
Part 3: Query and Visualize Data with Apache Zeppelin and Superset
There was an Air Quality alert a few days ago near me and I was curious how I could keep track of this important environmental information. So NiFi! This data is different from weather data, but makes a lot of sense for analytics to add in data from Weather, Social and locally captured cameras. It's very easy to ingest these JSON and Camera Images via Apache NiFi. In the next section we will analyze the datasets and determine how we can aggregate and accumulate massive quantities of this data for tracking air quality in various areas over time and use that as a dimension with other relevant data like weather.
We are tracking contaminants and particles in the air.
These include:
- pm25, pm10 - atmospheric particulate matter
- so2 - sulfur dioxide
- no2 - nitrogen dioxide
- o3 - ozone
- co - carbon monoxide
Photos Courtesy of HazeCam - Brigantine, NJ
Example Data
{"location":"ARB OER","city":"CA8 - ARB","country":"US","distance":3848728.319714322,"measurements":[{"parameter":"pm25","value":-4,"lastUpdated":"2016-08-08T16:00:00.000Z","unit":"µg/m³","sourceName":"AirNow","averagingPeriod":{"value":1,"unit":"hours"}}],"coordinates":{"latitude":38.568504,"longitude":-121.493256}}
{ "location" : "MONTG", "parameter" : "o3", "date" : { "utc" : "2018-05-05T12:00:00.000Z", "local" : "2018-05-05T06:00:00-06:00" }, "value" : 0.004, "unit" : "ppm", "coordinates" : { "latitude" : 32.4069, "longitude" : -86.2564 }, "country" : "US", "city" : "Montgomery" }
Most of the data is arrays of JSON, so we can easily break that down into individual JSON records, derive an AVRO Schema from that data and then process it as we want. We can join them together and then convert into ORC files or HBase rows.
Data Feed Links
Haze Cam Provides Web Camera Images of Potential Haze
http://hazecam.net/images/main/brigantine_right.jpg
OpenAQ (https://openaq.org/#/?_k=7mfsz6) Provides Open Air Quality Data
https://api.openaq.org/v1/latest?country=US
https://api.openaq.org/v1/measurements?country=US&date_from=2018-05-04
Air NOW API (Provides forecasts and current conditions)
EPA's Air Quality Notifications
http://feeds.enviroflash.info/
https://www.airnow.gov/index.cfm?action=airnow.national
http://feeds.enviroflash.info/rss/realtime/445.xml
Other Sources