Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Converting xml to csv with avro schema and Nifi controller service

Highlighted

Converting xml to csv with avro schema and Nifi controller service

New Contributor

I have hard time to convert nested xml to csv using ConvertRecord and controller service (Not using custom script).


Input xml has following structure (this is just sample)

<root>
<header>
<year>2019</year>
</header>
<body>
 <climate>
      <month>08</month>
      <day>06</day>
      <temperature>37°C</temperature>
  </climate>
  <transaction>
   <shop>beijing</shop>
   <user>a</user>
   <user>b</user>
   <user>c</user>
   <user>d</user>
   <user>e</user>
   <detail>
       <item>diamond</item>
       <number>10</number>
       <number>7</number>
       <number>8</number>
       <number>4</number>
       <number>8</number>
   </detail>
   <detail>
       <item>ruby</item>
       <number>1</number>
       <number>4</number>
       <number>2</number>
       <number>4</number>
       <number>1</number>
   </detail>
  </transaction>
</body>
</root>

And desired csv output is as follows

year,month,day,temperature,shop,items,users,numbers
2019,08,06,37°C,beijing,diamond,"a,b,c,d,e","10,7,8,4,8"
2019,08,06,37°C,beijing,ruby,"a,b,c,d,e","1,4,2,4,1"


And I defined avro format as follows.

{"type":"record",
"name":"nifiRecord",
"namespace":"org.apache.nifi",
"fields":[{"name":"year","type":["null","int"]},
          {"name":"climate","type":["null",{"type":"record","name":"climateType","fields":[{"name":"month","type":["null","int"]},
          {"name":"day","type":["null","int"]},
          {"name":"temperature","type":["null","string"]}]}]},
          {"name":"transaction","type":["null",{"type":"record","name":"transactionType","fields":         [{"name":"shop","type":["null","string"]},
           {"name":"user","type":["null",{"type":"array","items":"string"}]},{"name":"detail","type":   ["null",{"type":"array","items":{"type":"record","name":"detailType","fields":[{"name":"item","type":["null","string"]},
          {"name":"number","type":["null",{"type":"array","items":"int"}]}]}}]}]}]}]}


However, with the above avro schema to query the xml data, I got following output, which is far from ideal.

110195-1565078007953.png


Initially, I was trying ExecuteScript Processor with Jython to convert this complex xml to csv format, but since it was slow, I wonder if there are ways to do conversion without any coding at all.


I appreciate any advice.