Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Split data into multiple files using NIFI based on filter condition

avatar
Rising Star

I have a requirement where I have a input text file and I have to route the data to different directories based on some filter on the data values using NIFI.But the challenge is the condition will be provided at run time and I have to read the condition from a config file. Is there any option/processor in NIFI to achieve this requirement?

1 ACCEPTED SOLUTION

avatar
Master Guru

It could likely be done with a combination of processors (one part of the flow to read the file and load the conditions into a DistributedMapCache, another to read the input file (GetFile or ListFile -> FetchFile) and possibly split into individual records (with SplitText), extract the desired values with ExtractText, then get the conditions from the DistributedMapCache and route (with RouteOnAttribute) to the various paths.

If you are comfortable with a programming language like Groovy, Jython, JRuby, Lua, or Javascript, you could use InvokeScriptedProcessor to accomplish any/all of the above. I'd recommend you keep the script to handling just the reading of the config file and the filtering of the data, as the other processors above handle the remaining tasks very well.

If you will only have two routes, you can also use ExecuteScript for scripting, but that processor only gives you "success" and "failure" routes. InvokeScriptedProcessor lets you implement a full Processor so you can define your own relationships/routes. I have some examples (here and here) of InvokeScriptedProcessor, along with many other examples of scripting in NiFi, on my blog.

View solution in original post

3 REPLIES 3

avatar
Master Guru

It could likely be done with a combination of processors (one part of the flow to read the file and load the conditions into a DistributedMapCache, another to read the input file (GetFile or ListFile -> FetchFile) and possibly split into individual records (with SplitText), extract the desired values with ExtractText, then get the conditions from the DistributedMapCache and route (with RouteOnAttribute) to the various paths.

If you are comfortable with a programming language like Groovy, Jython, JRuby, Lua, or Javascript, you could use InvokeScriptedProcessor to accomplish any/all of the above. I'd recommend you keep the script to handling just the reading of the config file and the filtering of the data, as the other processors above handle the remaining tasks very well.

If you will only have two routes, you can also use ExecuteScript for scripting, but that processor only gives you "success" and "failure" routes. InvokeScriptedProcessor lets you implement a full Processor so you can define your own relationships/routes. I have some examples (here and here) of InvokeScriptedProcessor, along with many other examples of scripting in NiFi, on my blog.

avatar
Rising Star

To load the conditions into a DistributedMapCache do I need to do any coding?If yes in which language? Can you provide a link of working example of DistributedMapCache? Actually I don't have any experience in any of the programming languages you mentioned so I wanted to know if the first option you mentioned would require any programming?

avatar
Guru

The following HCC How-To shows a nifi flow where the first steps read from and process a config file. Hope it may be useful. (Shout-out to @Matt Burgess for initial guidance on this).

)

Using NiFi to ingest and transform RSS feeds to HDFS using an external config file