Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

EvaluateXPath can't return multiple node values

Solved Go to solution
Highlighted

EvaluateXPath can't return multiple node values

Explorer

I am trying to parse an xml to extract the header row (exactly one row at the top of the tsv file) and data rows and write it as single text file in tsv format. But i was unable to extract 2 different NODE types(COLUMNS and DATA) and also multiple DATA elements can be present in the xml). sample xml file. If Destination is "flowfile-content" i can't add two different node types (COLUMNS and DATA) and also can't get multiple <DATA> nodes. Please suggest how i can grab the Columns and DATA and write to a Tab Separated Value file.

<?xml version="1.0" encoding="UTF-8" ?>

<COMPS ReplyCode="0" ReplyText="Operation Successful">

<COUNT Records="258"/>

<DELIMITER value="09"/>

<COLUMNS>Column1Column2Column3Column4Column5 </COLUMNS>

<DATA>value11 value12 value13 value14 value15</DATA>

<DATA>value21 value22 value23 value24 value25</DATA>

</COMPS>

screen-shot-2017-10-12-at-51939-pm.png

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: EvaluateXPath can't return multiple node values

Super Guru
@Putta Challa

Can you once try using EvaluateXQuery processor

Destionation as flowfile-attribute

add the below properties:-

data

//DATA

extracts all the data node and keep them as attributes to the flow file.

columns

//COLUMNS

extracts all the columns node and keep them as attributes to the flowfile.

all_data

string-join((for $x in //DATA return $x/text()), '09')

gets all the data node and seperate them with 09

columns_data

string-join((for $y in (for $x in /COMPS return string-join(($x/DATA/text() , $x/COLUMNS/text()), '09')) return $y), '09')

it joins data and columns node values into one and keep this columns_data as attribute.

40843-xquery.png

then use replace text processor to create new flow file.

Replacetext configs:-

change Replacement Value to

${data.1} //we are having 2 data nodes here we are using data.1 attribute value
${data.2} //data.2 attribute value
${columns} //columns node value
${all_data} //it includes all the data values with 09 separator.
${columns_data} //it includes all the data,columns values with 09 separator.

40844-replace-text.png

Output:-

value11 value12 value13 value14 value15 
value21 value22 value23 value24 value25
Column1Column2Column3Column4Column5 
value11 value12 value13 value14 value1509value21 value22 value23 value24 value25
value11 value12 value13 value14 value1509value21 value22 value23 value24 value2509Column1Column2Column3Column4Column5 

Sample Flow:-

40845-flow-xml.png

Method2:-

If you are thinking to just write COLUMNS and DATA to new file that would be easy we can achieve that result by using Replace Text Processor with these properties

Change Search Value to

[\s\S]{1,}<COLUMNS>(.*)<\/COLUMNS>[\r\n]+<DATA>(.*)<\/DATA>[\r\n]+<DATA>(.*)<\/DATA>[\s\S]{1,}

ReplaceText Search Value Config:-

40841-searchvalue-prop.png

and Replacement Value to

$1
$2
$3

Here we are replacing all the captured groups in replacement value, it will replaces the content of the flowfile with new content as we mentioned in replacement value property.

40842-rt-config.png

so this ReplaceText processor gets input file as

<?xml version="1.0" encoding="UTF-8" ?>
<COMPS ReplyCode="0" ReplyText="Operation Successful">
<COUNT Records="258"/>
<DELIMITER value="09"/>
<COLUMNS>Column1Column2Column3Column4Column5 </COLUMNS>
<DATA>value11 value12 value13 value14 value15</DATA>
<DATA>value21 value22 value23 value24 value25</DATA>
</COMPS>

Output:-

Column1Column2Column3Column4Column5 
value11 value12 value13 value14 value15
value21 value22 value23 value24 value25

You can use either ways which will best fit for your case :).

View solution in original post

6 REPLIES 6
Highlighted

Re: EvaluateXPath can't return multiple node values

Explorer

error message:

18:26:39 UTC ERROR 16fb46ea-015f-1000-0000-00007c5fd65b 172.31.192.18:8080

EvaluateXPath[id=16fb46ea-015f-1000-0000-00007c5fd65b] Routing StandardFlowFileRecord[uuid=b941988f-f925-4ec2-a3e5-2830e47cbe8b,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1507917338870-99, container=default, section=99], offset=204551, length=204551],offset=0,name=listings.xml,size=204551] to 'failure' because the XPath evaluated to 257 XML nodes

Highlighted

Re: EvaluateXPath can't return multiple node values

Super Guru
@Putta Challa

Can you once try using EvaluateXQuery processor

Destionation as flowfile-attribute

add the below properties:-

data

//DATA

extracts all the data node and keep them as attributes to the flow file.

columns

//COLUMNS

extracts all the columns node and keep them as attributes to the flowfile.

all_data

string-join((for $x in //DATA return $x/text()), '09')

gets all the data node and seperate them with 09

columns_data

string-join((for $y in (for $x in /COMPS return string-join(($x/DATA/text() , $x/COLUMNS/text()), '09')) return $y), '09')

it joins data and columns node values into one and keep this columns_data as attribute.

40843-xquery.png

then use replace text processor to create new flow file.

Replacetext configs:-

change Replacement Value to

${data.1} //we are having 2 data nodes here we are using data.1 attribute value
${data.2} //data.2 attribute value
${columns} //columns node value
${all_data} //it includes all the data values with 09 separator.
${columns_data} //it includes all the data,columns values with 09 separator.

40844-replace-text.png

Output:-

value11 value12 value13 value14 value15 
value21 value22 value23 value24 value25
Column1Column2Column3Column4Column5 
value11 value12 value13 value14 value1509value21 value22 value23 value24 value25
value11 value12 value13 value14 value1509value21 value22 value23 value24 value2509Column1Column2Column3Column4Column5 

Sample Flow:-

40845-flow-xml.png

Method2:-

If you are thinking to just write COLUMNS and DATA to new file that would be easy we can achieve that result by using Replace Text Processor with these properties

Change Search Value to

[\s\S]{1,}<COLUMNS>(.*)<\/COLUMNS>[\r\n]+<DATA>(.*)<\/DATA>[\r\n]+<DATA>(.*)<\/DATA>[\s\S]{1,}

ReplaceText Search Value Config:-

40841-searchvalue-prop.png

and Replacement Value to

$1
$2
$3

Here we are replacing all the captured groups in replacement value, it will replaces the content of the flowfile with new content as we mentioned in replacement value property.

40842-rt-config.png

so this ReplaceText processor gets input file as

<?xml version="1.0" encoding="UTF-8" ?>
<COMPS ReplyCode="0" ReplyText="Operation Successful">
<COUNT Records="258"/>
<DELIMITER value="09"/>
<COLUMNS>Column1Column2Column3Column4Column5 </COLUMNS>
<DATA>value11 value12 value13 value14 value15</DATA>
<DATA>value21 value22 value23 value24 value25</DATA>
</COMPS>

Output:-

Column1Column2Column3Column4Column5 
value11 value12 value13 value14 value15
value21 value22 value23 value24 value25

You can use either ways which will best fit for your case :).

View solution in original post

Highlighted

Re: EvaluateXPath can't return multiple node values

Explorer

@Shu, I am close to resolving this with method1 you suggested, except that number of DATA elements is variable (could be thousands) and COLUMNS is a single xml node with 200+ fields, and output should be

  1. Column1Column2Column3Column4Column5
  2. value11 value12 value13 value14 value15
  3. value21 value22 value23 value24 value25
Highlighted

Re: EvaluateXPath can't return multiple node values

New Contributor

@Shu, I have tried with first solution but i am encountered with the following attached error .please help me to resolve my issue as it is urgent for me.

untitled.png

Highlighted

Re: EvaluateXPath can't return multiple node values

Super Guru
@satyadevi jagata

change the below property value in EvaluateXQuery processor to

Destination

flowfile-attribute

Then try to re run the processor.

If the above property value set to flowfile-content then Processor doesn't allow more than one query to be added.

If the issue still doesn't resolved, please open a new question for more visibility to the community and

Add all the details what you have tried so far and sample data to reproduce the same issue.

Highlighted

Re: EvaluateXPath can't return multiple node values

New Contributor

Hi I have tried the first solution but i am encountered with the following error. Please suggest me the solution.

untitled.png

Don't have an account?
Coming from Hortonworks? Activate your account here