- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Split data into multiple files using NIFI based on filter condition
- Labels:
-
Apache Hadoop
-
Apache NiFi
Created ‎08-01-2016 12:50 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a requirement where I have a input text file and I have to route the data to different directories based on some filter on the data values using NIFI.But the challenge is the condition will be provided at run time and I have to read the condition from a config file. Is there any option/processor in NIFI to achieve this requirement?
Created ‎08-01-2016 12:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It could likely be done with a combination of processors (one part of the flow to read the file and load the conditions into a DistributedMapCache, another to read the input file (GetFile or ListFile -> FetchFile) and possibly split into individual records (with SplitText), extract the desired values with ExtractText, then get the conditions from the DistributedMapCache and route (with RouteOnAttribute) to the various paths.
If you are comfortable with a programming language like Groovy, Jython, JRuby, Lua, or Javascript, you could use InvokeScriptedProcessor to accomplish any/all of the above. I'd recommend you keep the script to handling just the reading of the config file and the filtering of the data, as the other processors above handle the remaining tasks very well.
If you will only have two routes, you can also use ExecuteScript for scripting, but that processor only gives you "success" and "failure" routes. InvokeScriptedProcessor lets you implement a full Processor so you can define your own relationships/routes. I have some examples (here and here) of InvokeScriptedProcessor, along with many other examples of scripting in NiFi, on my blog.
Created ‎08-01-2016 12:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
It could likely be done with a combination of processors (one part of the flow to read the file and load the conditions into a DistributedMapCache, another to read the input file (GetFile or ListFile -> FetchFile) and possibly split into individual records (with SplitText), extract the desired values with ExtractText, then get the conditions from the DistributedMapCache and route (with RouteOnAttribute) to the various paths.
If you are comfortable with a programming language like Groovy, Jython, JRuby, Lua, or Javascript, you could use InvokeScriptedProcessor to accomplish any/all of the above. I'd recommend you keep the script to handling just the reading of the config file and the filtering of the data, as the other processors above handle the remaining tasks very well.
If you will only have two routes, you can also use ExecuteScript for scripting, but that processor only gives you "success" and "failure" routes. InvokeScriptedProcessor lets you implement a full Processor so you can define your own relationships/routes. I have some examples (here and here) of InvokeScriptedProcessor, along with many other examples of scripting in NiFi, on my blog.
Created ‎08-01-2016 01:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To load the conditions into a DistributedMapCache do I need to do any coding?If yes in which language? Can you provide a link of working example of DistributedMapCache? Actually I don't have any experience in any of the programming languages you mentioned so I wanted to know if the first option you mentioned would require any programming?
Created ‎08-02-2016 09:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The following HCC How-To shows a nifi flow where the first steps read from and process a config file. Hope it may be useful. (Shout-out to @Matt Burgess for initial guidance on this).
)
Using NiFi to ingest and transform RSS feeds to HDFS using an external config file
