- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Nifi/DataFlow example that loops through a list?
- Labels:
-
Apache NiFi
Created ‎09-10-2016 01:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm a total dataflow/nifi rookie.
I'm trying to accomplish something like the following:
Given a database table like this
Customer_ID (varchar), DoA (boolean), DoB (boolean), DoC (boolean)
I want to:
1) query the table (select *)
2) for each customer:
3a) if DoA, execute some steps (move some files around, etc)
3b) if DoB, execute some steps
3c) if DoC, execute some steps
4) Update some logs files, etc.
I've been playing with some of the example templates here: https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
But I haven't found anything to show me how to accomplish step 2 above.
Is it possible to work through a loop like this?
In the nifi training class, the instructor said that this is a common use case, but I can't seem to find a template that looks like this.
Can someone point me at an example to get me going?
Created ‎09-10-2016 07:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I'm understanding the scenario correctly, you would probably do something like this...
- ExecuteSQL/QueryDatabaseTable to get data from the database, produces Avro
- ConvertAvroToJSON or ConvertAvroToCSV, I'm going to use JSON going forward
- SplitJSON to split each record into its own flow file
- EvaluateJSONPath to extract DoA, DoB, and DoC into flow file attributes
From here it kind of depends the logic you want to happen and whether those three fields are mutually exclusive (only one is ever true) or if 2 out of 3 can be true, but you would use RouteOnAttribute with a property like DoA = ${DoA:equals("true")} to send everything that matches that to that relationship, and then send that relationship to the processors you want to perform the logic when DoA is true.
You could have a series of RouteOnAttribute processors, or you could have one with complex statements like:
${DoA:equals("true"):and( ${DoB:equals("false")} )}
You can take a look at the expression language guide for more detail on constructing the right expressions:
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
Created ‎09-10-2016 07:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If I'm understanding the scenario correctly, you would probably do something like this...
- ExecuteSQL/QueryDatabaseTable to get data from the database, produces Avro
- ConvertAvroToJSON or ConvertAvroToCSV, I'm going to use JSON going forward
- SplitJSON to split each record into its own flow file
- EvaluateJSONPath to extract DoA, DoB, and DoC into flow file attributes
From here it kind of depends the logic you want to happen and whether those three fields are mutually exclusive (only one is ever true) or if 2 out of 3 can be true, but you would use RouteOnAttribute with a property like DoA = ${DoA:equals("true")} to send everything that matches that to that relationship, and then send that relationship to the processors you want to perform the logic when DoA is true.
You could have a series of RouteOnAttribute processors, or you could have one with complex statements like:
${DoA:equals("true"):and( ${DoB:equals("false")} )}
You can take a look at the expression language guide for more detail on constructing the right expressions:
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
Created ‎09-12-2016 09:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Bryan,
I'll give this a try!
Created ‎09-15-2016 12:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The key here for me is a shift in thinking.
The SplitJSON processor "splits" my flow into X flows, based on the results of my query. And then I can run Y of them at a time. It's not a quite a loop (unless Y == 1), but it makes sense now.
Created ‎09-15-2016 12:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
(I posted another nifi question here if anyone reading this has an answer: https://community.hortonworks.com/questions/56616/options-for-exporting-large-data-sets-from-hive-to...
