Member since
10-13-2017
12
Posts
0
Kudos Received
0
Solutions
10-22-2017
03:51 AM
1 Kudo
Hi @Mohan Sure, We can get results as you expected by using EvaluateXquery //we can keep all the required contents as attributes of flowfile.
UpdateAttribute //update the contents of attributes that got extracted in evaluatexquery processor.
ReplaceText //replace the flowfile content with attributes of flowfile
PutHDFS //store files into HDFS EvaluateXquery Configurations:- Change the existing properties 1.Destination to flowfile-attribute 2.Output: Omit XML Declaration to true Add new properties by clicking + sign 1.author //author 2.book //book 3.bookstore //bookstore
Input:- <?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="myfile.xsl" ?>
<bookstore specialty="novel">
<book style="autobiography">
<author>
<first-name>Joe</first-name>
<last-name>Bob</last-name>
<award>Trenton Literary Review Honorable Mention</award>
</author>
<price>12</price>
</book>
</bookstore> Output:- As you can see in screenshot all the content are as attributes(book,bookstore,author) to the flowfile. EvaluateXquery Processor configs screenshot:- Update Attribute Processor:- 1.author ${author:replaceAll('<author>([\s\S]+.*)<\/author>','$1')} updating the author attribute input to updateattribute processor:- <author> <first-name>Joe</first-name> <last-name>Bob</last-name> <award>Trenton Literary Review Honorable Mention</award> </author> Output:- <first-name>Joe</first-name> <last-name>Bob</last-name> <award>Trenton Literary Review Honorable Mention</award> 2.book ${book:replaceAll('<book\s(.*)>[\s\S]+<\/author>([\s\S]+)<\/book>','$1$2')} Input:- <book style="autobiography"> <author> <first-name>Joe</first-name> <last-name>Bob</last-name> <award>Trenton Literary Review Honorable Mention</award> </author> <price>12</price> </book> Output:- style="autobiography" <price>12</price> 3.bookstore ${bookstore:replaceAll('.*<bookstore\s(.*?)>[\s\S]+.*','$1')} Input:- <bookstore specialty="novel"> <book style="autobiography"> <author> <first-name>Joe</first-name> <last-name>Bob</last-name> <award>Trenton Literary Review Honorable Mention</award> </author> <price>12</price> </book> </bookstore> Output:- specialty="novel"
Configs:- ReplaceText Processor:- Cchange the properties of Replacement Strategy to alwaysreplace and use your attributes bookstore,book,author in this processor and we are going to overwrite the existing contents of flowfile with the new content. add 2 more replacetext processors for book and author attributes. Output:- <first-name>Joe</first-name>
<last-name>Bob</last-name>
<award>Trenton Literary Review Honorable Mention</award> PutHDFS processor:- Configure the processor and give the directory name where you want to store the data. Flow Screenshot:- For testing purpose i have use generate flowfile processor but in your case generate flowfile processor will be the source processor from where you are getting this xml data.
... View more