Created on 05-05-201801:51 PM - edited 08-17-201907:36 AM
Tracking Air Quality with HDP and HDF: Part 1 - Apache NiFi Ingest
Part 2: Plan Data Storage. Store to Apache Hive, Apache Druid and Apache HBase.
Part 3: Query and Visualize Data with Apache Zeppelin and Superset
There was an Air Quality alert a few days ago near me and I was curious how I could keep track of this important environmental information. So NiFi! This data is different from weather data, but makes a lot of sense for analytics to add in data from Weather, Social and locally captured cameras. It's very easy to ingest these JSON and Camera Images via Apache NiFi. In the next section we will analyze the datasets and determine how we can aggregate and accumulate massive quantities of this data for tracking air quality in various areas over time and use that as a dimension with other relevant data like weather.
We are tracking contaminants and particles in the air.
Most of the data is arrays of JSON, so we can easily break that down into individual JSON records, derive an AVRO Schema from that data and then process it as we want. We can join them together and then convert into ORC files or HBase rows.
Data Feed Links
Haze Cam Provides Web Camera Images of Potential Haze