Created 09-10-2016 01:44 PM
I'm a total dataflow/nifi rookie.
I'm trying to accomplish something like the following:
Given a database table like this
Customer_ID (varchar), DoA (boolean), DoB (boolean), DoC (boolean)
I want to:
1) query the table (select *)
2) for each customer:
3a) if DoA, execute some steps (move some files around, etc)
3b) if DoB, execute some steps
3c) if DoC, execute some steps
4) Update some logs files, etc.
I've been playing with some of the example templates here: https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates
But I haven't found anything to show me how to accomplish step 2 above.
Is it possible to work through a loop like this?
In the nifi training class, the instructor said that this is a common use case, but I can't seem to find a template that looks like this.
Can someone point me at an example to get me going?
Created 09-10-2016 07:29 PM
If I'm understanding the scenario correctly, you would probably do something like this...
- ExecuteSQL/QueryDatabaseTable to get data from the database, produces Avro
- ConvertAvroToJSON or ConvertAvroToCSV, I'm going to use JSON going forward
- SplitJSON to split each record into its own flow file
- EvaluateJSONPath to extract DoA, DoB, and DoC into flow file attributes
From here it kind of depends the logic you want to happen and whether those three fields are mutually exclusive (only one is ever true) or if 2 out of 3 can be true, but you would use RouteOnAttribute with a property like DoA = ${DoA:equals("true")} to send everything that matches that to that relationship, and then send that relationship to the processors you want to perform the logic when DoA is true.
You could have a series of RouteOnAttribute processors, or you could have one with complex statements like:
${DoA:equals("true"):and( ${DoB:equals("false")} )}
You can take a look at the expression language guide for more detail on constructing the right expressions:
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
Created 09-10-2016 07:29 PM
If I'm understanding the scenario correctly, you would probably do something like this...
- ExecuteSQL/QueryDatabaseTable to get data from the database, produces Avro
- ConvertAvroToJSON or ConvertAvroToCSV, I'm going to use JSON going forward
- SplitJSON to split each record into its own flow file
- EvaluateJSONPath to extract DoA, DoB, and DoC into flow file attributes
From here it kind of depends the logic you want to happen and whether those three fields are mutually exclusive (only one is ever true) or if 2 out of 3 can be true, but you would use RouteOnAttribute with a property like DoA = ${DoA:equals("true")} to send everything that matches that to that relationship, and then send that relationship to the processors you want to perform the logic when DoA is true.
You could have a series of RouteOnAttribute processors, or you could have one with complex statements like:
${DoA:equals("true"):and( ${DoB:equals("false")} )}
You can take a look at the expression language guide for more detail on constructing the right expressions:
https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html
Created 09-12-2016 09:28 AM
Thanks Bryan,
I'll give this a try!
Created 09-15-2016 12:52 PM
The key here for me is a shift in thinking.
The SplitJSON processor "splits" my flow into X flows, based on the results of my query. And then I can run Y of them at a time. It's not a quite a loop (unless Y == 1), but it makes sense now.
Created 09-15-2016 12:53 PM
(I posted another nifi question here if anyone reading this has an answer: https://community.hortonworks.com/questions/56616/options-for-exporting-large-data-sets-from-hive-to...