Created 03-03-2017 08:57 AM
I need to ingest a XML file containing a relational database model and populate each table in hive
I have created a nifi flow with an splitXml processor but i cannot figure out what to do Next.
when a run my flow i can se that my XML is splittet into 5 flowfiles, but what do to after that in order to
Create hive tables and insert data or create csv files that can be insertet into existing hive tables.
Right now the only thing i have is my XML file and a XSD file.
I need guideline and best pratice for processing rather complex XML and populate them in hive
Created 03-03-2017 03:19 PM
Clearly this is not exactly what you are looking for (and it is only focusing on the HDFS file >> Hive table part of this (and is showing a VERY SIMPLE xml file)), but maybe https://github.com/lestermartin/oss-transform-processing-comparison/tree/master/file-formats/xml#hiv... might be of some small help in the overall problem you are facing. Good luck!
Created 03-04-2017 06:11 AM
I am not answering to your question directly, just sharing my XML experience: I am using Spark to read XML files and DataFrames are saved to Hive tables for further analysis.
Thanks for the Nifi idea, I will look into it.
Created 07-18-2018 10:48 AM
I hope you have solved this problem by now as it is a year since you asked the question.
But if not I wrote a XML2CSV converter processor that can be used to solve your problem.
The source code can be found here nifi-xml including documentation and a guide
/Max