Created on 08-29-2018 03:43 PM - edited 08-17-2019 06:37 AM
IoT Edge Processing with Apache NiFi and MiniFi and Multiple Deep Learning Libraries Series
For: https://conferences.oreilly.com/strata/strata-ny/public/schedule/detail/68140
In preparation for my talk on utilizing edge devices for deep learning, IoT sensor reading and big data processing I have updated my environment to the latest and greatest tools available.
With the upgrade of HDF to 3.2, I can now use Apache NiFi 1.7 and MiniFi 0.5 for IoT data ingestion, simple event processing, conversion, data processing, data flow and storage.
The architecture diagram above shows the basic flow we are utilizing.
IoT Step by Step
SQL Tables in Hive
I stream my data into Apache ORC files stored on HDP 3.0 HDFS directories and build external tables on them.
CREATE EXTERNAL TABLE IF NOT EXISTS rainbow (tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, temp DOUBLE, diskfree STRING, altitude DOUBLE, ts STRING, tempf2 DOUBLE, memory DOUBLE) STORED AS ORC LOCATION '/rainbow'; CREATE EXTERNAL TABLE IF NOT EXISTS gps (speed STRING, diskfree STRING, altitude STRING, ts STRING, cputemp DOUBLE, latitude STRING, track STRING, memory DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, epd STRING, utc STRING, epx STRING, epy STRING, epv STRING, ept STRING, eps STRING, longitude STRING, mode STRING, time STRING, climb STRING, epc STRING) STORED AS ORC LOCATION '/gps';
For my processing needs I also have a Hive 3 ACID table for general table usage and updates.
create table rainbowacid(tempf DOUBLE, cputemp DOUBLE, pressure DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, temp DOUBLE, diskfree STRING, altitude DOUBLE, ts STRING, tempf2 DOUBLE, memory DOUBLE) STORED AS ORC TBLPROPERTIES ('transactional'='true'); CREATE TABLE IF NOT EXISTS gpsacid (speed STRING, diskfree STRING, altitude STRING, ts STRING, cputemp DOUBLE, latitude STRING, track STRING, memory DOUBLE, host STRING, uniqueid STRING, ipaddress STRING, epd STRING, utc STRING, epx STRING, epy STRING, epv STRING, ept STRING, eps STRING, longitude STRING, mode STRING, time STRING, climb STRING, epc STRING) STORED AS ORC TBLPROPERTIES ('transactional'='true');
Then I load my initial data.
insert into rainbowacid select * from rainbow; insert into gpsacid select * from gps;
Hive 3.x Updates
%jdbc(hive) CREATE TABLE Persons_default ( ID Int NOT NULL, Name String NOT NULL, Age Int, Creator String DEFAULT CURRENT_USER(), CreateDate Date DEFAULT CURRENT_DATE() )
One of the cool new features in Hive is that you can now have defaults, as you can see which are helpful for things like standard defaults you might want like current data. This gives us even more relational style features in Hive.
Another very interesting feature is materialized views which help you for having clean and fast subqueries. Here is a cool example:
CREATE MATERIALIZED VIEW mv1 AS SELECT dest,origin,count(*) FROM flights_hdfs GROUP BY dest,origin
References: