Created on 05-03-201804:00 PM - edited 08-17-201907:38 AM
Converting CSV Files to Apache Hive Tables with Apache ORC Files
I received some CSV files of data to load into Apache Hive. There are many ways to do this, but I wanted to see how easy it was to do in Apache NiFi with zero code.
I read CSV files from a directory of files. Then I can Convert the CSV to AVRO directly with ConvertRecord.
I will need a schema, so I use the below settings for InferAvroSchema. if ever file is different, you will need to do this every time.
CSV Reader
I use the Jackson CSV parser which works very well. The first line of the CSV is a header. It can figure out the fields from the header.
Once I have an Apache AVRO file it's easy to convert to Apache ORC and then store in HDFS.