Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Now Live: Explore expert insights and technical deep dives on the new Cloudera Community BlogsRead the Announcement
Labels (2)
avatar
Master Guru

Converting CSV Files to Apache Hive Tables with Apache ORC Files


I received some CSV files of data to load into Apache Hive. There are many ways to do this, but I wanted to see how easy it was to do in Apache NiFi with zero code.

72565-cvsprocessing1.png

I read CSV files from a directory of files. Then I can Convert the CSV to AVRO directly with ConvertRecord.

72562-convertrecordcsv.png

I will need a schema, so I use the below settings for InferAvroSchema. if ever file is different, you will need to do this every time.

72563-avroschemafromcsv.png

CSV Reader

72564-csvreader.png

I use the Jackson CSV parser which works very well. The first line of the CSV is a header. It can figure out the fields from the header.

Once I have an Apache AVRO file it's easy to convert to Apache ORC and then store in HDFS.

Template:

csvprocess.xml

4,377 Views
Version history
Last update:
‎08-17-2019 07:38 AM
Updated by:
Contributors