Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to process CSV file based on info in an XML metadata file using Nifi

avatar

I have two files that get dropped into a folder. The first is a CSV file containing the data to be processed and landed in Hive. The second is an XML file that contains metadata about the CSV file. The metadata file contains information such as compression to be used (Snappy, etc..), HDFs storage format (AVRO, ORC, etc...), the table the data needs to be saved to, the different columns/schema in the CSV, as well as some other information.

My question is what is the best strategy/way through Nifi to use this metadata file to process the CSV file and land the data in Hive?

I've looked at using Schema Registry, but I believe that will only cover the columns mapping portion rather than the other info such as table name, storage format and compression.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hey Eyad,

One option is to use the XML as the starting point/ingestion/trigger. Once you get the getFile/fetchFile you can pass it to evaluateXPath to read/parse the XML file and turn the values into attributes.

Once you have the attributes you should have everything you need to prep the file (fetch file, create table, putHDFS, etc). We do something similar for our ingestion but use a sql db that has all the metadata information. Once we detect a file, we query mysql to pull in the similar info you have in your XML file.

View solution in original post

2 REPLIES 2

avatar
Expert Contributor

Hey Eyad,

One option is to use the XML as the starting point/ingestion/trigger. Once you get the getFile/fetchFile you can pass it to evaluateXPath to read/parse the XML file and turn the values into attributes.

Once you have the attributes you should have everything you need to prep the file (fetch file, create table, putHDFS, etc). We do something similar for our ingestion but use a sql db that has all the metadata information. Once we detect a file, we query mysql to pull in the similar info you have in your XML file.

avatar

Thanks Matt,

Interesting approach and makes a lot of sense to do things that way. I'll give it a try. Thanks for your help.