Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Labels (1)

Introduction

Recently worked with use case which required heavy xml processing. Instead of writing complex custom code end up achieved everything easily with NiFi. I thought this will be useful of someone interested for XML processing in NiFi. The document in general covers the following.

  • Base64 Encoding and Decoding of XML message.
  • Character Set conversion from UTF to Ascii ISO-8859-1
  • XML validation against the XSD.
  • Split the XML into smaller chunks.
  • Transform XML to JSON.
  • Extract the content and outputs into unique files based on content.

This is very generic XML processing flow which can be leveraged across many business use cases which process xml data.

Apache NiFi Flow

In the sample demo scenario,

  • External system sends the Base64 encoded XML data in file format which is read through GetFile processor.
  • Next Base64EncodeContent processor decoded the Base64 content.
  • Incoming data in UTF-8 format with leading BOM bytes which gets converted to the ISO-8859-1 format using the ConvertCharacterSet processor.
  • XML content is validated against the XML schema using ValidateXML processor.
  • The validated XML fragment splits at the root’s children level into smaller XML chunks.
  • The split xml is converted into JSON object using the XSLT and further written into individual files.
  • Every file is named based on the unique identifier from the flow content.

9223-screen-shot-2016-11-07-at-110225-am.gif

Processor Configurations

Base64EncodeContent

9224-base64decode.gif

ConvertCharacterSet

9225-charset.gif

ValidateXml:

Value :/Users/mpandit/jdeveloper/mywork/ClaimProcess/ClaimProcess/Initiate_App.xsd

9226-validation.gif

SplitXml:

9227-splitxml.gif

TransformXMLToJSON:

9228-jsonxslt.gif

EvalutateJsonPath

9229-extractdata.gif

UpdateAttribute

9230-updateattribute.gif

Sample Input and Outputs

Input Base64 Encoded XML:

PD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiID8+DQo8cGVyc29ucyB4bWxuczp4

c2k9Imh0dHA6Ly93d3cudzMub3JnLzIwMDEvWE1MU2NoZW1hLWluc3RhbmNlIiB4c2k6bm9OYW1l

c3BhY2VTY2hlbWFMb2NhdGlvbj0iaGVhZGVyLnhzZCI+DQogIDxwZXJzb24+DQogICAgPGZ1bGxf

bmFtZT5NUDwvZnVsbF9uYW1lPg0KICAgIDxjaGlsZF9uYW1lPkFCPC9jaGlsZF9uYW1lPg0KICA8

L3BlcnNvbj4NCiAgPHBlcnNvbj4NCiAgICA8ZnVsbF9uYW1lPkdQPC9mdWxsX25hbWU+DQogICAg

PGNoaWxkX25hbWU+Q0Q8L2NoaWxkX25hbWU+DQogIDwvcGVyc29uPg0KICA8cGVyc29uPg0KICAg

IDxmdWxsX25hbWU+SlA8L2Z1bGxfbmFtZT4NCiAgICA8Y2hpbGRfbmFtZT5FRjwvY2hpbGRfbmFt

ZT4NCiAgPC9wZXJzb24+ICANCjwvcGVyc29ucz4=

Base64 Decoded XML through NiFi:

<?xml version="1.0" encoding="UTF-8" ?>

<persons xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="header.xsd">

<person>

<full_name>MP</full_name>

<child_name>AB</child_name>

</person>

<person>

<full_name>GP</full_name>

<child_name>CD</child_name>

</person>

<person>

<full_name>JP</full_name>

<child_name>EF</child_name>

</person>

</persons>

Output split XML fragments:

Message 1:

<?xml version="1.0" encoding="UTF-8"?><person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<full_name>MP</full_name>

<child_name>AB</child_name>

</person>

Message 2:

<?xml version="1.0" encoding="UTF-8"?><person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<full_name>GP</full_name>

<child_name>CD</child_name>

</person>

Message 3:

<?xml version="1.0" encoding="UTF-8"?><person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

<full_name>JP</full_name>

<child_name>EF</child_name>

</person>

JSON output Files:

File output 1:

{

"person" : {

"full_name" : "GP",

"child_name" : "CD"

}}

File output 2:

{

"person" : {

"full_name" : "MP",

"child_name”: "AB"

}}

File output 3:

{

"person" : {

"full_name" : "JP",

"child_name" : "EF" }}

Testing NiFi DataFlow

Drop the base 64 encoded XML file which will be processed and split into smaller JSON representation of xml data into individual files.

Apache NiFi Benefits

In built NiFi processors significantly eliminates the need for custom code to process XML messages.

Handles multi byte character sets efficiently expanding range of character set support.

The generic XML processing flow templates can accelerate the overall development process.

Document References

https://nifi.apache.org/docs/nifi-docs/

8,052 Views
Comments
Not applicable

@milind pandit

Hello Milind,

Could you please share .xsd and .json fille.

Thanks,

Rajeev

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 08:24 AM
Updated by:
 
Contributors
Top Kudoed Authors