Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (2)
avatar
Master Guru

Converting CSV Files to Apache Hive Tables with Apache ORC Files


I received some CSV files of data to load into Apache Hive. There are many ways to do this, but I wanted to see how easy it was to do in Apache NiFi with zero code.

72565-cvsprocessing1.png

I read CSV files from a directory of files. Then I can Convert the CSV to AVRO directly with ConvertRecord.

72562-convertrecordcsv.png

I will need a schema, so I use the below settings for InferAvroSchema. if ever file is different, you will need to do this every time.

72563-avroschemafromcsv.png

CSV Reader

72564-csvreader.png

I use the Jackson CSV parser which works very well. The first line of the CSV is a header. It can figure out the fields from the header.

Once I have an Apache AVRO file it's easy to convert to Apache ORC and then store in HDFS.

Template:

csvprocess.xml

3,189 Views