Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)
Super Guru

62526-56690-icon.png

62527-56692-dwsberling.jpg

This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time.


See: https://community.hortonworks.com/content/kbentry/174399/apache-deep-learning-101-using-apache-mxnet...

To do proper analytics and provide fast SQL access to our inception data generated by Apache MXNet from our images, we need to land it into Apache Hive Transactional tables. We will use the Apache NiFi PutHiveStreaming processor to insert data into our ACID table at a rapid rate. This only works if you create a transactional table with Apache ORC, see the DDL below. You must also be running a new version of HDP 2.6+ that has ACID turned on.


Tip: In HDP 2.6.4, you will need to create and work with Apache Hive ACID tables with Hive. Not sql in Apache Zeppelin, since that is Apache Spark. jdbc(hive) is Apache Hive. See the configuration below to hive CBO and TEZ enabled as well.

Ambari View of Hive

62585-acidui.png

SQL DDL

%jdbc(hive) 

CREATE TABLE `inception`(
uuid STRING, top1pct STRING, top1 STRING, top2pct STRING, top2 STRING, top3pct STRING, top3 STRING, top4pct STRING, top4 STRING, top5pct STRING, top5 STRING, imagefilename STRING, 
runtime STRING)
CLUSTERED BY ( top1) 
INTO 3 BUCKETS
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.orc.OrcSerde' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
TBLPROPERTIES (  'transactional'='true')


%jdbc(hive)
select * from inception

The PutHiveStreaming processor requires that you have a table that is bucketed, uses Apache ORC and you have permissions. See the example above for a table DDL to use. You also need ACID and LLAP enabled on your Apache Hive cluster.

Details for PutHiveStreaming Processor

62586-hivestreamingproperties.png

An Example Apache MXNet to Hive Streaming View

62590-mxnethivestreaming.png

The Hive View 2.0 of the Data

62591-hiveview2.png

Apache Zeppelin Table DDL and Query

62592-zeppelinviewacid.png


hivestreamingnififlow.pnghiveview2.png
704 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 08:42 AM
Updated by:
 
Contributors
Top Kudoed Authors