Support Questions

egarelnabi · ‎08-09-2017

I have two files that get dropped into a folder. The first is a CSV file containing the data to be processed and landed in Hive. The second is an XML file that contains metadata about the CSV file. The metadata file contains information such as compression to be used (Snappy, etc..), HDFs storage format (AVRO, ORC, etc...), the table the data needs to be saved to, the different columns/schema in the CSV, as well as some other information.

My question is what is the best strategy/way through Nifi to use this metadata file to process the CSV file and land the data in Hive?

I've looked at using Schema Registry, but I believe that will only cover the columns mapping portion rather than the other info such as table name, storage format and compression.

mliem · ‎08-09-2017

Hey Eyad,

One option is to use the XML as the starting point/ingestion/trigger. Once you get the getFile/fetchFile you can pass it to evaluateXPath to read/parse the XML file and turn the values into attributes.

Once you have the attributes you should have everything you need to prep the file (fetch file, create table, putHDFS, etc). We do something similar for our ingestion but use a sql db that has all the metadata information. Once we detect a file, we query mysql to pull in the similar info you have in your XML file.

View solution in original post

mliem · ‎08-09-2017

Hey Eyad,

One option is to use the XML as the starting point/ingestion/trigger. Once you get the getFile/fetchFile you can pass it to evaluateXPath to read/parse the XML file and turn the values into attributes.

Once you have the attributes you should have everything you need to prep the file (fetch file, create table, putHDFS, etc). We do something similar for our ingestion but use a sql db that has all the metadata information. Once we detect a file, we query mysql to pull in the similar info you have in your XML file.

egarelnabi · ‎08-14-2017

Thanks Matt,

Interesting approach and makes a lot of sense to do things that way. I'll give it a try. Thanks for your help.

Cloudera Community

Support Questions

How to process CSV file based on info in an XML metadata file using Nifi

Nifi: Compare contents of two files

Ignore first line of a file and process second lin...

Autoscale File Processing - A Disciplined Approach

Start process group using nifi REST API

Decompressing nested ZIP files in NiFi

Reading multiple csv files without headers using s...

Reading ORC files using Mapreduce

How to convert/merge Many flow files to single flo...

Convert data from JSON/CSV/Avro to Parquet with Ni...

Load csv file into mySql DB using nifi