Support Questions

Find answers, ask questions, and share your expertise

Converting xml to csv with avro schema and Nifi controller service

I have hard time to convert nested xml to csv using ConvertRecord and controller service (Not using custom script).


Input xml has following structure (this is just sample)

<root>
<header>
<year>2019</year>
</header>
<body>
 <climate>
      <month>08</month>
      <day>06</day>
      <temperature>37°C</temperature>
  </climate>
  <transaction>
   <shop>beijing</shop>
   <user>a</user>
   <user>b</user>
   <user>c</user>
   <user>d</user>
   <user>e</user>
   <detail>
       <item>diamond</item>
       <number>10</number>
       <number>7</number>
       <number>8</number>
       <number>4</number>
       <number>8</number>
   </detail>
   <detail>
       <item>ruby</item>
       <number>1</number>
       <number>4</number>
       <number>2</number>
       <number>4</number>
       <number>1</number>
   </detail>
  </transaction>
</body>
</root>

And desired csv output is as follows

year,month,day,temperature,shop,items,users,numbers
2019,08,06,37°C,beijing,diamond,"a,b,c,d,e","10,7,8,4,8"
2019,08,06,37°C,beijing,ruby,"a,b,c,d,e","1,4,2,4,1"


And I defined avro format as follows.

{"type":"record",
"name":"nifiRecord",
"namespace":"org.apache.nifi",
"fields":[{"name":"year","type":["null","int"]},
          {"name":"climate","type":["null",{"type":"record","name":"climateType","fields":[{"name":"month","type":["null","int"]},
          {"name":"day","type":["null","int"]},
          {"name":"temperature","type":["null","string"]}]}]},
          {"name":"transaction","type":["null",{"type":"record","name":"transactionType","fields":         [{"name":"shop","type":["null","string"]},
           {"name":"user","type":["null",{"type":"array","items":"string"}]},{"name":"detail","type":   ["null",{"type":"array","items":{"type":"record","name":"detailType","fields":[{"name":"item","type":["null","string"]},
          {"name":"number","type":["null",{"type":"array","items":"int"}]}]}}]}]}]}]}


However, with the above avro schema to query the xml data, I got following output, which is far from ideal.

110195-1565078007953.png


Initially, I was trying ExecuteScript Processor with Jython to convert this complex xml to csv format, but since it was slow, I wonder if there are ways to do conversion without any coding at all.


I appreciate any advice.


1 REPLY 1

Explorer

@daisuke_baba  were you able to resolve your issue ?