<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Regarding Converting XML to Avro schema in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Regarding-Converting-XML-to-Avro-schema/m-p/171709#M134002</link>
    <description>&lt;P&gt;Hi Friends,&lt;/P&gt;&lt;P&gt;Needed a help here.&lt;/P&gt;&lt;P&gt;I was able to covert one of xml file lying in the zip - trafficLocs_data_for_simulator.zip to avro schema by able to define its structure in EvaluteXPath (image attached for reference). Many thanks to &lt;A rel="user" href="https://community.cloudera.com/users/10969/mqureshi.html" nodeid="10969"&gt;@mqureshi&lt;/A&gt; for his help in solving my last question.&lt;/P&gt;&lt;P&gt;Now I want to understand , how we handle bigger xmls , do we need to define its structure in EvaluateXPath completely or is there is a simpler way to handle this?&lt;/P&gt;&lt;P&gt;How we handle conversion of these big xml's into avro which really exist in the real life. Please advise.&lt;/P&gt;&lt;P&gt;Attached some xmls for your reference.&lt;/P&gt;&lt;P&gt;Many Thanks,&lt;/P&gt;&lt;P&gt;Ankur&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/12370-evaluate-xpath.png"&gt;evaluate-xpath.png&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/12391-xmls.zip"&gt;xmls.zip&lt;/A&gt;&lt;/P&gt;</description>
    <pubDate>Mon, 13 Feb 2017 01:52:05 GMT</pubDate>
    <dc:creator>ankurkapoor_wor</dc:creator>
    <dc:date>2017-02-13T01:52:05Z</dc:date>
    <item>
      <title>Regarding Converting XML to Avro schema</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Regarding-Converting-XML-to-Avro-schema/m-p/171709#M134002</link>
      <description>&lt;P&gt;Hi Friends,&lt;/P&gt;&lt;P&gt;Needed a help here.&lt;/P&gt;&lt;P&gt;I was able to covert one of xml file lying in the zip - trafficLocs_data_for_simulator.zip to avro schema by able to define its structure in EvaluteXPath (image attached for reference). Many thanks to &lt;A rel="user" href="https://community.cloudera.com/users/10969/mqureshi.html" nodeid="10969"&gt;@mqureshi&lt;/A&gt; for his help in solving my last question.&lt;/P&gt;&lt;P&gt;Now I want to understand , how we handle bigger xmls , do we need to define its structure in EvaluateXPath completely or is there is a simpler way to handle this?&lt;/P&gt;&lt;P&gt;How we handle conversion of these big xml's into avro which really exist in the real life. Please advise.&lt;/P&gt;&lt;P&gt;Attached some xmls for your reference.&lt;/P&gt;&lt;P&gt;Many Thanks,&lt;/P&gt;&lt;P&gt;Ankur&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/12370-evaluate-xpath.png"&gt;evaluate-xpath.png&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="https://community.cloudera.com/legacyfs/online/attachments/12391-xmls.zip"&gt;xmls.zip&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 13 Feb 2017 01:52:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Regarding-Converting-XML-to-Avro-schema/m-p/171709#M134002</guid>
      <dc:creator>ankurkapoor_wor</dc:creator>
      <dc:date>2017-02-13T01:52:05Z</dc:date>
    </item>
    <item>
      <title>Re: Regarding Converting XML to Avro schema</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Regarding-Converting-XML-to-Avro-schema/m-p/171710#M134003</link>
      <description>&lt;P&gt;&lt;A rel="user" href="https://community.cloudera.com/users/15898/ankurkapoorwork.html" nodeid="15898"&gt;@Ankur Kapoor&lt;/A&gt;&lt;/P&gt;&lt;P&gt;If you have really big XML files which are not coming in real time, but rather sitting on machines, then I would not use Nifi. Nifi is more for real time data flow. For a use where you have large files to import and change formats, for example, from XML to AVRO, I would suggest writing a script, where you create a hive table on your XML data and then use INSERT INTO &amp;lt;avro table&amp;gt; SELECT FROM &amp;lt;xml table&amp;gt; to write data in avro format. Use the following serde&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/dvasilen/Hive-XML-SerDe" target="_blank"&gt;https://github.com/dvasilen/Hive-XML-SerDe&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&lt;A href="http://stackoverflow.com/questions/41299994/parse-xml-and-store-in-hive-table" target="_blank"&gt;http://stackoverflow.com/questions/41299994/parse-xml-and-store-in-hive-table&lt;/A&gt; --&amp;gt;good example on how to use here&lt;/P&gt;&lt;P&gt;Nifi will do the job to but I would not introduce a new tool just for this batch use case.&lt;/P&gt;</description>
      <pubDate>Mon, 13 Feb 2017 08:12:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Regarding-Converting-XML-to-Avro-schema/m-p/171710#M134003</guid>
      <dc:creator>mqureshi</dc:creator>
      <dc:date>2017-02-13T08:12:00Z</dc:date>
    </item>
    <item>
      <title>Re: Regarding Converting XML to Avro schema</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Regarding-Converting-XML-to-Avro-schema/m-p/171711#M134004</link>
      <description>&lt;P&gt;Hi &lt;A rel="user" href="https://community.cloudera.com/users/10969/mqureshi.html" nodeid="10969"&gt;@mqureshi&lt;/A&gt;,&lt;/P&gt;&lt;P&gt;I should have been specific before. My requirement is that the big xml files are coming in real time and it is needed to be ingested through Nifi to covert into avro format.&lt;/P&gt;&lt;P&gt;I had attached some of the xmls for your reference. Kindly have a look at those and advise. &lt;/P&gt;&lt;P&gt;I have been reading and found --&amp;gt;&lt;/P&gt;&lt;P&gt;1. TransformXML processor - convert xml to json format easily , but it requires us to know XSLT format.&lt;/P&gt;&lt;P&gt;Kindly advise.&lt;/P&gt;&lt;P&gt;Thanks,&lt;/P&gt;&lt;P&gt;Ankur&lt;/P&gt;</description>
      <pubDate>Mon, 13 Feb 2017 16:39:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Regarding-Converting-XML-to-Avro-schema/m-p/171711#M134004</guid>
      <dc:creator>ankurkapoor_wor</dc:creator>
      <dc:date>2017-02-13T16:39:17Z</dc:date>
    </item>
    <item>
      <title>Re: Regarding Converting XML to Avro schema</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Regarding-Converting-XML-to-Avro-schema/m-p/171712#M134005</link>
      <description>&lt;DIV&gt;&lt;P&gt;Once you have 
complex XSD schemas, large volumes of XML files, a streaming requirement, and very large XML files
 it will quite hard to convert the XML.&lt;/P&gt;&lt;P&gt;I have written up a blog post that shows how you can fully automate &lt;A href="https://sonra.io/2018/04/26/converting-fpml-xml-avro/"&gt;the conversion of XML to Avro&lt;/A&gt;
 using the Flexter XML converter for XML and JSON. In the post we are using the FpML schema, 
which is one of the most complex and widely used XML data standard 
schemas. It also includes an ER diagram and data lineage.&lt;/P&gt;
&lt;/DIV&gt;</description>
      <pubDate>Thu, 26 Apr 2018 16:37:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Regarding-Converting-XML-to-Avro-schema/m-p/171712#M134005</guid>
      <dc:creator>uli_bethke</dc:creator>
      <dc:date>2018-04-26T16:37:35Z</dc:date>
    </item>
  </channel>
</rss>

