Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (2)
Super Guru

Converting CSV Files to Apache Hive Tables with Apache ORC Files


I received some CSV files of data to load into Apache Hive. There are many ways to do this, but I wanted to see how easy it was to do in Apache NiFi with zero code.

72565-cvsprocessing1.png

I read CSV files from a directory of files. Then I can Convert the CSV to AVRO directly with ConvertRecord.

72562-convertrecordcsv.png

I will need a schema, so I use the below settings for InferAvroSchema. if ever file is different, you will need to do this every time.

72563-avroschemafromcsv.png

CSV Reader

72564-csvreader.png

I use the Jackson CSV parser which works very well. The first line of the CSV is a header. It can figure out the fields from the header.

Once I have an Apache AVRO file it's easy to convert to Apache ORC and then store in HDFS.

Template:

csvprocess.xml

809 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 07:38 AM
Updated by:
 
Contributors
Top Kudoed Authors