Created on 02-27-201808:44 PM - edited 08-17-201908:42 AM
This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time.
To do proper analytics and provide fast SQL access to our inception data generated by Apache MXNet from our images, we need to land it into Apache Hive Transactional tables. We will use the Apache NiFi PutHiveStreaming processor to insert data into our ACID table at a rapid rate. This only works if you create a transactional table with Apache ORC, see the DDL below. You must also be running a new version of HDP 2.6+ that has ACID turned on.
Tip: In HDP 2.6.4, you will need to create and work with Apache Hive ACID tables with Hive. Not sql in Apache Zeppelin, since that is Apache Spark. jdbc(hive) is Apache Hive. See the configuration below to hive CBO and TEZ enabled as well.
The PutHiveStreaming processor requires that you have a table that is bucketed, uses Apache ORC and you have permissions. See the example above for a table DDL to use. You also need ACID and LLAP enabled on your Apache Hive cluster.