Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi 1.6 - Load CSV to Hive table on the fly

Highlighted

NiFi 1.6 - Load CSV to Hive table on the fly

New Contributor

We are using Nifi 1.6, need help to create template for below case:

For example I have a CSV file IRIS.csv which has headers in it (Folder contains 100 IRIS<NUMBER>.csv) need to ingest all these files (APPEND) as one table in HIVE.

Currently I create a table in HIVE manually. However, I need to create table in HIVE from NIFI flow itself. So that I can parameterize the flow and ingest varieties of schema data later.

Note : NiFi 1.6 does not support ConvertAvroToORC

3 REPLIES 3

Re: NiFi 1.6 - Load CSV to Hive table on the fly

Super Guru

What do you mean that NiFi 1.6 doesn't support ConvertAvroToORC? That processor should be in every NiFi release since 1.0.

Re: NiFi 1.6 - Load CSV to Hive table on the fly

New Contributor

Thanks @Matt Burgess for the response.

I crosschecked with my admin team and they gave below information:
Earlier admin team found a Hive jar missmatch issue ( for using PutHiveQL - HiveConnectionPool) as we use NiFi 1.6 (Hive 1.2) and CDH(Hive 1.1)
Admin team have rebuild "NiFi 1.6 nar" with Hive 1.1 and they were forced to commented out ConvertAVROToORC, HiveStreaming proccesors.

Please let me know if there is a option to build "NiFi 1.6 nar" to support ConvertAVROToORC, HiveStreaming and PutHiveQL proccesors.

Re: NiFi 1.6 - Load CSV to Hive table on the fly

Super Guru

(This answer is based on the additional comments in the question) You would need code changes to get NiFi to work with an older version of Hive (such as 1.1) which is probably why they are commented out. As long as hive-orc was available in Hive 1.1 then you should be able to customize ConvertAvroToORC to be able to run on Hive 1.1.

As an alternative, since you are using CDH, perhaps Parquet is a better fit than ORC? If so you could use the PutParquet processor, it is "record-aware" so can take the CSV in without conversion, and write it directly to Parquet in HDFS.

As a more experimental alternative, as of NiFi 1.7.0 you can activate the "-Pinclude-hive3" profile in your Maven build and it will produce a Hive 3 NAR that uses Apache Hive 3.0.0. The HiveQL processors will not be compatible (nor PutHive3Streaming) but you may find that PutORC (the ORC version of PutParquet) does work on your system.

Don't have an account?
Coming from Hortonworks? Activate your account here