Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Nifi/DataFlow example that loops through a list?

avatar
Super Collaborator

I'm a total dataflow/nifi rookie.

I'm trying to accomplish something like the following:

Given a database table like this

Customer_ID (varchar), DoA (boolean), DoB (boolean), DoC (boolean)

I want to:

1) query the table (select *)

2) for each customer:

3a) if DoA, execute some steps (move some files around, etc)

3b) if DoB, execute some steps

3c) if DoC, execute some steps

4) Update some logs files, etc.

I've been playing with some of the example templates here: https://cwiki.apache.org/confluence/display/NIFI/Example+Dataflow+Templates

But I haven't found anything to show me how to accomplish step 2 above.

Is it possible to work through a loop like this?

In the nifi training class, the instructor said that this is a common use case, but I can't seem to find a template that looks like this.

Can someone point me at an example to get me going?

1 ACCEPTED SOLUTION

avatar
Master Guru

If I'm understanding the scenario correctly, you would probably do something like this...

- ExecuteSQL/QueryDatabaseTable to get data from the database, produces Avro

- ConvertAvroToJSON or ConvertAvroToCSV, I'm going to use JSON going forward

- SplitJSON to split each record into its own flow file

- EvaluateJSONPath to extract DoA, DoB, and DoC into flow file attributes

From here it kind of depends the logic you want to happen and whether those three fields are mutually exclusive (only one is ever true) or if 2 out of 3 can be true, but you would use RouteOnAttribute with a property like DoA = ${DoA:equals("true")} to send everything that matches that to that relationship, and then send that relationship to the processors you want to perform the logic when DoA is true.

You could have a series of RouteOnAttribute processors, or you could have one with complex statements like:

${DoA:equals("true"):and( ${DoB:equals("false")} )}

You can take a look at the expression language guide for more detail on constructing the right expressions:

https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html

View solution in original post

4 REPLIES 4

avatar
Master Guru

If I'm understanding the scenario correctly, you would probably do something like this...

- ExecuteSQL/QueryDatabaseTable to get data from the database, produces Avro

- ConvertAvroToJSON or ConvertAvroToCSV, I'm going to use JSON going forward

- SplitJSON to split each record into its own flow file

- EvaluateJSONPath to extract DoA, DoB, and DoC into flow file attributes

From here it kind of depends the logic you want to happen and whether those three fields are mutually exclusive (only one is ever true) or if 2 out of 3 can be true, but you would use RouteOnAttribute with a property like DoA = ${DoA:equals("true")} to send everything that matches that to that relationship, and then send that relationship to the processors you want to perform the logic when DoA is true.

You could have a series of RouteOnAttribute processors, or you could have one with complex statements like:

${DoA:equals("true"):and( ${DoB:equals("false")} )}

You can take a look at the expression language guide for more detail on constructing the right expressions:

https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html

avatar
Super Collaborator

Thanks Bryan,

I'll give this a try!

avatar
Super Collaborator

The key here for me is a shift in thinking.

The SplitJSON processor "splits" my flow into X flows, based on the results of my query. And then I can run Y of them at a time. It's not a quite a loop (unless Y == 1), but it makes sense now.

avatar
Super Collaborator

(I posted another nifi question here if anyone reading this has an answer: https://community.hortonworks.com/questions/56616/options-for-exporting-large-data-sets-from-hive-to...