Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

NiFi 1.6 - Load CSV to Hive table on the fly

New Contributor

We are using Nifi 1.6, need help to create template for below case:

For example I have a CSV file IRIS.csv which has headers in it (Folder contains 100 IRIS<NUMBER>.csv) need to ingest all these files (APPEND) as one table in HIVE.

Currently I create a table in HIVE manually. However, I need to create table in HIVE from NIFI flow itself. So that I can parameterize the flow and ingest varieties of schema data later.

Note : NiFi 1.6 does not support ConvertAvroToORC

3 REPLIES 3

Super Guru

What do you mean that NiFi 1.6 doesn't support ConvertAvroToORC? That processor should be in every NiFi release since 1.0.

New Contributor

Thanks @Matt Burgess for the response.

I crosschecked with my admin team and they gave below information:
Earlier admin team found a Hive jar missmatch issue ( for using PutHiveQL - HiveConnectionPool) as we use NiFi 1.6 (Hive 1.2) and CDH(Hive 1.1)
Admin team have rebuild "NiFi 1.6 nar" with Hive 1.1 and they were forced to commented out ConvertAVROToORC, HiveStreaming proccesors.

Please let me know if there is a option to build "NiFi 1.6 nar" to support ConvertAVROToORC, HiveStreaming and PutHiveQL proccesors.

Super Guru

(This answer is based on the additional comments in the question) You would need code changes to get NiFi to work with an older version of Hive (such as 1.1) which is probably why they are commented out. As long as hive-orc was available in Hive 1.1 then you should be able to customize ConvertAvroToORC to be able to run on Hive 1.1.

As an alternative, since you are using CDH, perhaps Parquet is a better fit than ORC? If so you could use the PutParquet processor, it is "record-aware" so can take the CSV in without conversion, and write it directly to Parquet in HDFS.

As a more experimental alternative, as of NiFi 1.7.0 you can activate the "-Pinclude-hive3" profile in your Maven build and it will produce a Hive 3 NAR that uses Apache Hive 3.0.0. The HiveQL processors will not be compatible (nor PutHive3Streaming) but you may find that PutORC (the ORC version of PutParquet) does work on your system.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.