Support Questions

Find answers, ask questions, and share your expertise

Apache Nifi - Transform Fixed Width File into Delimited File?

avatar
New Contributor

Hello!

 

I'm new to Nifi and struggling with what seems like a simple task.

I have a file with Header, Body and Trailer in fixed width layout.

file.PNG

 

The header record always starts with an A;

The Trailer record always starts with a Z;

Body records are identified by B, C, D and so on... (except A and Z)

 

So I first used Route Text to separate Header, Body and Trailer, because my goal here was to replace the fixed width with a delimiter (;) so I can make sense of the info in the file (header, body and trailer will have a different number of columns when separated by a delimiter).

 

Then I used ReplaceText with a regex to create columns delimited by semi colon instead of fixed width.

 

Now I need to regroup the rows and create a single file again, with the header, body and trailer, but this time all separated by semi colons. This is what I'd like to achieve:

 

FILE2.PNGMy template looks like this:

flow.PNG

 

Is that possible? I tried using MergeRecord for that but I really don't know how to configure its properties, and it's not merging anything.

1 REPLY 1

avatar
Master Mentor

@AnnaBea 

Let me make sure I am clear on your ask here:

1. You have successfully split your source file in to 3 parts (header line, body line(s), and footer line).
2. You have successfully modified all three split files as needed.
3. You are having issues re-assembling the three split files back in to one file in order of header, body, footer using MergeRecord processor?

With this particular dataflow design, the MergeRecord processor is not likely what you want to use.  You probably want to be using the MergeContent processor instead with a "Merge Strategy" of  "Defragment".  But to get these three source FlowFiles merged in a specific order would require some additional work in your upstream flow.  In order to use "Defragment" your three source FlowFiles all would need o have these FlowFile Attributes:

fragment.identifierAll split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.indexA one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
fragment.countThe number of split FlowFiles generated from the parent FlowFile

 

1. Add one UpdateAttribute processor before your RouteText and configure it to create the "fragement.identifier" attribute with a value of "${UUID()}" and another Attribute "Fragment.count" with a value of "3".  Each FlowFIle produced by RouteText should then have these two attribute set on it.

2. Then add one UpdateAttribute processor to each of teh 3 flow paths to set the "fragment.index" attribute uniquely per each dataflow path.  value=1 for header, value=2 for body, and value=3 for footer.
3. Now the MergeContent will have what it needs to bin these three files by the UUID and merge them in the proper order.

There are often times many ways to solve the same use case using NiFi components.  Some design choices are better than others and use less resources to accomplish the end goal.

While above is one solution, there are others I am sure.  Cloudera's professional services is a great resource that can help with use case designs.

If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.

Thank you,

Matt