Support Questions

AnnaBea · ‎10-25-2021

Hello!

I'm new to Nifi and struggling with what seems like a simple task.

I have a file with Header, Body and Trailer in fixed width layout.

The header record always starts with an A;

The Trailer record always starts with a Z;

Body records are identified by B, C, D and so on... (except A and Z)

So I first used Route Text to separate Header, Body and Trailer, because my goal here was to replace the fixed width with a delimiter (;) so I can make sense of the info in the file (header, body and trailer will have a different number of columns when separated by a delimiter).

Then I used ReplaceText with a regex to create columns delimited by semi colon instead of fixed width.

Now I need to regroup the rows and create a single file again, with the header, body and trailer, but this time all separated by semi colons. This is what I'd like to achieve:

My template looks like this:

Is that possible? I tried using MergeRecord for that but I really don't know how to configure its properties, and it's not merging anything.

MattWho · ‎11-02-2021

@AnnaBea

Let me make sure I am clear on your ask here:

1. You have successfully split your source file in to 3 parts (header line, body line(s), and footer line).
2. You have successfully modified all three split files as needed.
3. You are having issues re-assembling the three split files back in to one file in order of header, body, footer using MergeRecord processor?

With this particular dataflow design, the MergeRecord processor is not likely what you want to use. You probably want to be using the MergeContent processor instead with a "Merge Strategy" of "Defragment". But to get these three source FlowFiles merged in a specific order would require some additional work in your upstream flow. In order to use "Defragment" your three source FlowFiles all would need o have these FlowFile Attributes:

fragment.identifier	All split FlowFiles produced from the same parent FlowFile will have the same randomly generated UUID added for this attribute
fragment.index	A one-up number that indicates the ordering of the split FlowFiles that were created from a single parent FlowFile
fragment.count	The number of split FlowFiles generated from the parent FlowFile

1. Add one UpdateAttribute processor before your RouteText and configure it to create the "fragement.identifier" attribute with a value of "${UUID()}" and another Attribute "Fragment.count" with a value of "3". Each FlowFIle produced by RouteText should then have these two attribute set on it.

2. Then add one UpdateAttribute processor to each of teh 3 flow paths to set the "fragment.index" attribute uniquely per each dataflow path. value=1 for header, value=2 for body, and value=3 for footer.
3. Now the MergeContent will have what it needs to bin these three files by the UUID and merge them in the proper order.

There are often times many ways to solve the same use case using NiFi components. Some design choices are better than others and use less resources to accomplish the end goal.

While above is one solution, there are others I am sure. Cloudera's professional services is a great resource that can help with use case designs.

If you found this response assisted with your query, please take a moment to login and click on "Accept as Solution" below this post.

Thank you,

Matt

Cloudera Community

Support Questions

Apache Nifi - Transform Fixed Width File into Delimited File?

How can I convert a fixed width file into Json usi...

NiFi: How to detect updates to S3 files and insert...

Decompressing nested ZIP files in NiFi

Creating an Impala External Table from fixed width...

Processing Fixed Width Files in Hive Using Native ...

Read files from SFTP server and put file in back u...

Working with CDE Files Resources

How to access Ozone file system using Java API

Using Apache NiFi with Apache Pulsar for Streaming

Counting lines in text files with NiFi - part 1