Created 10-13-2017 03:41 PM
I am trying to parse an xml to extract the header row (exactly one row at the top of the tsv file) and data rows and write it as single text file in tsv format. But i was unable to extract 2 different NODE types(COLUMNS and DATA) and also multiple DATA elements can be present in the xml). sample xml file. If Destination is "flowfile-content" i can't add two different node types (COLUMNS and DATA) and also can't get multiple <DATA> nodes. Please suggest how i can grab the Columns and DATA and write to a Tab Separated Value file.
<?xml version="1.0" encoding="UTF-8" ?>
<COMPS ReplyCode="0" ReplyText="Operation Successful">
<COUNT Records="258"/>
<DELIMITER value="09"/>
<COLUMNS>Column1Column2Column3Column4Column5 </COLUMNS>
<DATA>value11 value12 value13 value14 value15</DATA>
<DATA>value21 value22 value23 value24 value25</DATA>
</COMPS>
Created on 10-14-2017 02:27 AM - edited 08-17-2019 08:50 PM
Can you once try using EvaluateXQuery processor
Destionation as flowfile-attribute
add the below properties:-
data
//DATA
extracts all the data node and keep them as attributes to the flow file.
columns
//COLUMNS
extracts all the columns node and keep them as attributes to the flowfile.
all_data
string-join((for $x in //DATA return $x/text()), '09')
gets all the data node and seperate them with 09
columns_data
string-join((for $y in (for $x in /COMPS return string-join(($x/DATA/text() , $x/COLUMNS/text()), '09')) return $y), '09')
it joins data and columns node values into one and keep this columns_data as attribute.
then use replace text processor to create new flow file.
Replacetext configs:-
change Replacement Value to
${data.1} //we are having 2 data nodes here we are using data.1 attribute value ${data.2} //data.2 attribute value ${columns} //columns node value ${all_data} //it includes all the data values with 09 separator. ${columns_data} //it includes all the data,columns values with 09 separator.
Output:-
value11 value12 value13 value14 value15 value21 value22 value23 value24 value25 Column1Column2Column3Column4Column5 value11 value12 value13 value14 value1509value21 value22 value23 value24 value25 value11 value12 value13 value14 value1509value21 value22 value23 value24 value2509Column1Column2Column3Column4Column5
Sample Flow:-
Method2:-
If you are thinking to just write COLUMNS and DATA to new file that would be easy we can achieve that result by using Replace Text Processor with these properties
Change Search Value to
[\s\S]{1,}<COLUMNS>(.*)<\/COLUMNS>[\r\n]+<DATA>(.*)<\/DATA>[\r\n]+<DATA>(.*)<\/DATA>[\s\S]{1,}
ReplaceText Search Value Config:-
and Replacement Value to
$1
$2
$3
Here we are replacing all the captured groups in replacement value, it will replaces the content of the flowfile with new content as we mentioned in replacement value property.
so this ReplaceText processor gets input file as
<?xml version="1.0" encoding="UTF-8" ?> <COMPS ReplyCode="0" ReplyText="Operation Successful"> <COUNT Records="258"/> <DELIMITER value="09"/> <COLUMNS>Column1Column2Column3Column4Column5 </COLUMNS> <DATA>value11 value12 value13 value14 value15</DATA> <DATA>value21 value22 value23 value24 value25</DATA> </COMPS>
Output:-
Column1Column2Column3Column4Column5 value11 value12 value13 value14 value15 value21 value22 value23 value24 value25
You can use either ways which will best fit for your case :).
Created 10-13-2017 06:36 PM
error message:
18:26:39 UTC
ERROR
16fb46ea-015f-1000-0000-00007c5fd65b
172.31.192.18:8080
EvaluateXPath[id=16fb46ea-015f-1000-0000-00007c5fd65b] Routing StandardFlowFileRecord[uuid=b941988f-f925-4ec2-a3e5-2830e47cbe8b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1507917338870-99, container=default, section=99], offset=204551, length=204551],offset=0,name=listings.xml,size=204551] to 'failure' because the XPath evaluated to 257 XML nodes
Created on 10-14-2017 02:27 AM - edited 08-17-2019 08:50 PM
Can you once try using EvaluateXQuery processor
Destionation as flowfile-attribute
add the below properties:-
data
//DATA
extracts all the data node and keep them as attributes to the flow file.
columns
//COLUMNS
extracts all the columns node and keep them as attributes to the flowfile.
all_data
string-join((for $x in //DATA return $x/text()), '09')
gets all the data node and seperate them with 09
columns_data
string-join((for $y in (for $x in /COMPS return string-join(($x/DATA/text() , $x/COLUMNS/text()), '09')) return $y), '09')
it joins data and columns node values into one and keep this columns_data as attribute.
then use replace text processor to create new flow file.
Replacetext configs:-
change Replacement Value to
${data.1} //we are having 2 data nodes here we are using data.1 attribute value ${data.2} //data.2 attribute value ${columns} //columns node value ${all_data} //it includes all the data values with 09 separator. ${columns_data} //it includes all the data,columns values with 09 separator.
Output:-
value11 value12 value13 value14 value15 value21 value22 value23 value24 value25 Column1Column2Column3Column4Column5 value11 value12 value13 value14 value1509value21 value22 value23 value24 value25 value11 value12 value13 value14 value1509value21 value22 value23 value24 value2509Column1Column2Column3Column4Column5
Sample Flow:-
Method2:-
If you are thinking to just write COLUMNS and DATA to new file that would be easy we can achieve that result by using Replace Text Processor with these properties
Change Search Value to
[\s\S]{1,}<COLUMNS>(.*)<\/COLUMNS>[\r\n]+<DATA>(.*)<\/DATA>[\r\n]+<DATA>(.*)<\/DATA>[\s\S]{1,}
ReplaceText Search Value Config:-
and Replacement Value to
$1
$2
$3
Here we are replacing all the captured groups in replacement value, it will replaces the content of the flowfile with new content as we mentioned in replacement value property.
so this ReplaceText processor gets input file as
<?xml version="1.0" encoding="UTF-8" ?> <COMPS ReplyCode="0" ReplyText="Operation Successful"> <COUNT Records="258"/> <DELIMITER value="09"/> <COLUMNS>Column1Column2Column3Column4Column5 </COLUMNS> <DATA>value11 value12 value13 value14 value15</DATA> <DATA>value21 value22 value23 value24 value25</DATA> </COMPS>
Output:-
Column1Column2Column3Column4Column5 value11 value12 value13 value14 value15 value21 value22 value23 value24 value25
You can use either ways which will best fit for your case :).
Created 10-16-2017 10:03 PM
@Shu, I am close to resolving this with method1 you suggested, except that number of DATA elements is variable (could be thousands) and COLUMNS is a single xml node with 200+ fields, and output should be
Created 09-20-2018 05:26 AM
@Shu, I have tried with first solution but i am encountered with the following attached error .please help me to resolve my issue as it is urgent for me.
Created 09-20-2018 11:36 PM
change the below property value in EvaluateXQuery processor to
Destination
flowfile-attribute
Then try to re run the processor.
If the above property value set to flowfile-content then Processor doesn't allow more than one query to be added.
If the issue still doesn't resolved, please open a new question for more visibility to the community and
Add all the details what you have tried so far and sample data to reproduce the same issue.
Created 09-20-2018 01:14 PM
Hi I have tried the first solution but i am encountered with the following error. Please suggest me the solution.