Needed a help here.
I was able to covert one of xml file lying in the zip - trafficLocs_data_for_simulator.zip to avro schema by able to define its structure in EvaluteXPath (image attached for reference). Many thanks to @mqureshi for his help in solving my last question.
Now I want to understand , how we handle bigger xmls , do we need to define its structure in EvaluateXPath completely or is there is a simpler way to handle this?
How we handle conversion of these big xml's into avro which really exist in the real life. Please advise.
Attached some xmls for your reference.
If you have really big XML files which are not coming in real time, but rather sitting on machines, then I would not use Nifi. Nifi is more for real time data flow. For a use where you have large files to import and change formats, for example, from XML to AVRO, I would suggest writing a script, where you create a hive table on your XML data and then use INSERT INTO <avro table> SELECT FROM <xml table> to write data in avro format. Use the following serde
http://stackoverflow.com/questions/41299994/parse-xml-and-store-in-hive-table -->good example on how to use here
Nifi will do the job to but I would not introduce a new tool just for this batch use case.
I should have been specific before. My requirement is that the big xml files are coming in real time and it is needed to be ingested through Nifi to covert into avro format.
I had attached some of the xmls for your reference. Kindly have a look at those and advise.
I have been reading and found -->
1. TransformXML processor - convert xml to json format easily , but it requires us to know XSLT format.
Once you have complex XSD schemas, large volumes of XML files, a streaming requirement, and very large XML files it will quite hard to convert the XML.
I have written up a blog post that shows how you can fully automate the conversion of XML to Avro using the Flexter XML converter for XML and JSON. In the post we are using the FpML schema, which is one of the most complex and widely used XML data standard schemas. It also includes an ER diagram and data lineage.