- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 02-27-2018 08:44 PM - edited 08-17-2019 08:42 AM
This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time.
To do proper analytics and provide fast SQL access to our inception data generated by Apache MXNet from our images, we need to land it into Apache Hive Transactional tables. We will use the Apache NiFi PutHiveStreaming processor to insert data into our ACID table at a rapid rate. This only works if you create a transactional table with Apache ORC, see the DDL below. You must also be running a new version of HDP 2.6+ that has ACID turned on.
Tip: In HDP 2.6.4, you will need to create and work with Apache Hive ACID tables with Hive. Not sql in Apache Zeppelin, since that is Apache Spark. jdbc(hive) is Apache Hive. See the configuration below to hive CBO and TEZ enabled as well.
Ambari View of Hive
SQL DDL
%jdbc(hive) CREATE TABLE `inception`( uuid STRING, top1pct STRING, top1 STRING, top2pct STRING, top2 STRING, top3pct STRING, top3 STRING, top4pct STRING, top4 STRING, top5pct STRING, top5 STRING, imagefilename STRING, runtime STRING) CLUSTERED BY ( top1) INTO 3 BUCKETS ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.orc.OrcSerde' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' TBLPROPERTIES ( 'transactional'='true') %jdbc(hive) select * from inception
The PutHiveStreaming processor requires that you have a table that is bucketed, uses Apache ORC and you have permissions. See the example above for a table DDL to use. You also need ACID and LLAP enabled on your Apache Hive cluster.
Details for PutHiveStreaming Processor
An Example Apache MXNet to Hive Streaming View
The Hive View 2.0 of the Data
Apache Zeppelin Table DDL and Query