Community Articles
Find and share helpful community-sourced technical articles.
Labels (2)
Super Guru

Converting CSV Files to Apache Hive Tables with Apache ORC Files

I received some CSV files of data to load into Apache Hive. There are many ways to do this, but I wanted to see how easy it was to do in Apache NiFi with zero code.


I read CSV files from a directory of files. Then I can Convert the CSV to AVRO directly with ConvertRecord.


I will need a schema, so I use the below settings for InferAvroSchema. if ever file is different, you will need to do this every time.


CSV Reader


I use the Jackson CSV parser which works very well. The first line of the CSV is a header. It can figure out the fields from the header.

Once I have an Apache AVRO file it's easy to convert to Apache ORC and then store in HDFS.