Member since
05-16-2019
9
Posts
0
Kudos Received
0
Solutions
08-07-2019
12:58 AM
@Matt Burgess I am also thinking to convert deeply nested xml to csv, and thinking of using either of ConvertRecord, UpdateRecord or JoltTransformRecord. What are the difference between UpdateRecord and JoltTransformRecord? Which one should be suitable?
... View more
08-06-2019
12:35 PM
I have hard time to convert nested xml to csv using ConvertRecord and controller service (Not using custom script). Input xml has following structure (this is just sample) <root>
<header>
<year>2019</year>
</header>
<body>
<climate>
<month>08</month>
<day>06</day>
<temperature>37°C</temperature>
</climate>
<transaction>
<shop>beijing</shop>
<user>a</user>
<user>b</user>
<user>c</user>
<user>d</user>
<user>e</user>
<detail>
<item>diamond</item>
<number>10</number>
<number>7</number>
<number>8</number>
<number>4</number>
<number>8</number>
</detail>
<detail>
<item>ruby</item>
<number>1</number>
<number>4</number>
<number>2</number>
<number>4</number>
<number>1</number>
</detail>
</transaction>
</body>
</root> And desired csv output is as follows year,month,day,temperature,shop,items,users,numbers
2019,08,06,37°C,beijing,diamond,"a,b,c,d,e","10,7,8,4,8"
2019,08,06,37°C,beijing,ruby,"a,b,c,d,e","1,4,2,4,1" And I defined avro format as follows. {"type":"record",
"name":"nifiRecord",
"namespace":"org.apache.nifi",
"fields":[{"name":"year","type":["null","int"]},
{"name":"climate","type":["null",{"type":"record","name":"climateType","fields":[{"name":"month","type":["null","int"]},
{"name":"day","type":["null","int"]},
{"name":"temperature","type":["null","string"]}]}]},
{"name":"transaction","type":["null",{"type":"record","name":"transactionType","fields": [{"name":"shop","type":["null","string"]},
{"name":"user","type":["null",{"type":"array","items":"string"}]},{"name":"detail","type": ["null",{"type":"array","items":{"type":"record","name":"detailType","fields":[{"name":"item","type":["null","string"]},
{"name":"number","type":["null",{"type":"array","items":"int"}]}]}}]}]}]}]} However, with the above avro schema to query the xml data, I got following output, which is far from ideal. Initially, I was trying ExecuteScript Processor with Jython to convert this complex xml to csv format, but since it was slow, I wonder if there are ways to do conversion without any coding at all. I appreciate any advice.
... View more
Labels:
- Labels:
-
Apache NiFi
08-02-2019
02:25 PM
@Riccardo Iacomini Thank you for the great post! This is very helpful. Here I am wondering how you batch things together like having many csv rows instead of one csv row. Because if we want to batch csv row into multiple rows, we use MergeContent processor, but you also mention that MergeContent is costly. So how batch processing will work on Nifi??
... View more
07-22-2019
05:12 AM
@Shu Thank you and you are right. Now it can process and faster than before. Just wondering about combination of load balancing and concurrent task. If we assign concurrent tasks to some CPU-intensive processor like ExecuteScript and ExecuteStreamCommand, which run on all nodes, will load balancing the queued up data before the processor give better result than simply running it with concurrent task? Because I thought it takes some time to distribute relatively large data (~5 MB) across the cluster. And why not PutHDFS with some concurrent tasks and load balance together working?
... View more
07-19-2019
08:44 AM
I have this problem that the processor whose incoming connection queue is full (back pressure is applied) is not working at all like in the picture below. Once I delete the queue it is working fine. But this is problematic once it gets into operational phase. Can someone tell me why this is happening and any ways to fix it? Also, are there any ways to improve IO of PutHDFS except for assigning more concurrent task to it? Thank you.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache NiFi
05-22-2019
08:49 AM
I am afraid this is very basic question but could not find an answer after spending a couple of hours. So I post the question here. I'd like to add newlife when merging different csv flowfile with MergeContent so that output would be in right shape. I stumbled upon a couple of articles but none of them did not work like the following link. https://stackoverflow.com/questions/34257508/apache-nifi-mergecontent-processor-set-demarcator-as-new-line Here is my configuration. I tried shift+enter, \n and so on, but could not get the output which I need.. I super appreciate if you tell me how to do this. Thank you in advance
... View more
Labels:
- Labels:
-
Apache NiFi