- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 11-08-2016 05:24 PM - edited 08-17-2019 08:24 AM
Introduction
Recently worked with use case which required heavy xml processing. Instead of writing complex custom code end up achieved everything easily with NiFi. I thought this will be useful of someone interested for XML processing in NiFi. The document in general covers the following.
- Base64 Encoding and Decoding of XML message.
- Character Set conversion from UTF to Ascii ISO-8859-1
- XML validation against the XSD.
- Split the XML into smaller chunks.
- Transform XML to JSON.
- Extract the content and outputs into unique files based on content.
This is very generic XML processing flow which can be leveraged across many business use cases which process xml data.
Apache NiFi Flow
In the sample demo scenario,
- External system sends the Base64 encoded XML data in file format which is read through GetFile processor.
- Next Base64EncodeContent processor decoded the Base64 content.
- Incoming data in UTF-8 format with leading BOM bytes which gets converted to the ISO-8859-1 format using the ConvertCharacterSet processor.
- XML content is validated against the XML schema using ValidateXML processor.
- The validated XML fragment splits at the root’s children level into smaller XML chunks.
- The split xml is converted into JSON object using the XSLT and further written into individual files.
- Every file is named based on the unique identifier from the flow content.
Processor Configurations
Base64EncodeContent
ConvertCharacterSet
ValidateXml:
Value :/Users/mpandit/jdeveloper/mywork/ClaimProcess/ClaimProcess/Initiate_App.xsd
SplitXml:
TransformXMLToJSON:
EvalutateJsonPath
UpdateAttribute
Sample Input and Outputs
Input Base64 Encoded XML:
PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiID8+DQo8cGVyc29ucyB4bWxuczp4
c2k9Imh0dHA6Ly93d3cudzMub3JnLzIwMDEvWE1MU2NoZW1hLWluc3RhbmNlIiB4c2k6bm9OYW1l
c3BhY2VTY2hlbWFMb2NhdGlvbj0iaGVhZGVyLnhzZCI+DQogIDxwZXJzb24+DQogICAgPGZ1bGxf
bmFtZT5NUDwvZnVsbF9uYW1lPg0KICAgIDxjaGlsZF9uYW1lPkFCPC9jaGlsZF9uYW1lPg0KICA8
L3BlcnNvbj4NCiAgPHBlcnNvbj4NCiAgICA8ZnVsbF9uYW1lPkdQPC9mdWxsX25hbWU+DQogICAg
PGNoaWxkX25hbWU+Q0Q8L2NoaWxkX25hbWU+DQogIDwvcGVyc29uPg0KICA8cGVyc29uPg0KICAg
IDxmdWxsX25hbWU+SlA8L2Z1bGxfbmFtZT4NCiAgICA8Y2hpbGRfbmFtZT5FRjwvY2hpbGRfbmFt
ZT4NCiAgPC9wZXJzb24+ICANCjwvcGVyc29ucz4=
Base64 Decoded XML through NiFi:
<?xml version="1.0" encoding="UTF-8" ?>
<persons xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="header.xsd">
<person>
<full_name>MP</full_name>
<child_name>AB</child_name>
</person>
<person>
<full_name>GP</full_name>
<child_name>CD</child_name>
</person>
<person>
<full_name>JP</full_name>
<child_name>EF</child_name>
</person>
</persons>
Output split XML fragments:
Message 1:
<?xml version="1.0" encoding="UTF-8"?><person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<full_name>MP</full_name>
<child_name>AB</child_name>
</person>
Message 2:
<?xml version="1.0" encoding="UTF-8"?><person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<full_name>GP</full_name>
<child_name>CD</child_name>
</person>
Message 3:
<?xml version="1.0" encoding="UTF-8"?><person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<full_name>JP</full_name>
<child_name>EF</child_name>
</person>
JSON output Files:
File output 1:
{
"person" : {
"full_name" : "GP",
"child_name" : "CD"
}}
File output 2:
{
"person" : {
"full_name" : "MP",
"child_name”: "AB"
}}
File output 3:
{
"person" : {
"full_name" : "JP",
"child_name" : "EF" }}
Testing NiFi DataFlow
Drop the base 64 encoded XML file which will be processed and split into smaller JSON representation of xml data into individual files.
Apache NiFi Benefits
In built NiFi processors significantly eliminates the need for custom code to process XML messages.
Handles multi byte character sets efficiently expanding range of character set support.
The generic XML processing flow templates can accelerate the overall development process.
Document References
Created on 12-15-2016 07:26 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content