Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Question about multiple columns lookup and chain of lookup

avatar
Rising Star

Dear all,

A bit background about this:

I receive fresh data from a data vendor on a weekly basis and aim to either insert or update my existing database. If the lookup identifies matching values (such as company ID, fiscal year, etc.) in a record, we perform an update on specific fields (e.g., announcement date); otherwise, we insert a new record.

My chain of lookups:

jarviszzzz_0-1706165136495.png

My lookup processor setup:

jarviszzzz_5-1706165054644.pngjarviszzzz_6-1706165066759.pngjarviszzzz_7-1706165075830.png

jarviszzzz_1-1706165176755.png

My data:

Before lookup:

jarviszzzz_1-1706164544571.png

After first lookup (Company ID) (It return me fs_ID):

jarviszzzz_2-1706164571166.png

After second lookup (Fiscal_Year) (It return me another fs_ID):

jarviszzzz_3-1706164616218.png

Note: fs_ID is retrived value from lookup tables

However, I check my lookup table:

jarviszzzz_4-1706164700257.png

It does not match the "Company_ID", it only match the "Fiscal_year", It seems like those lookup processor worked independently. I expected this output: myincomingdata.Company_ID = ent_fundamentals.Company_ID AND myincomingdata.Fiscal_year = ent_fundamentals.Fiscal_year..etc., (I can update this record base on those columns if they all match)

Can anyone please help me this issue or is there any other way in doing lookup inNIFI?

1 ACCEPTED SOLUTION

avatar

Hi @jarviszzzz ,

The lookupRecord processor is not suppose to work in a chain  where the next lookup will filter based on the result of the previous one. Each LookupRecord is independent call to the database and it will return a result from the whole dataset based on the specified key. If you want to lookup record by multiple keys then you probably need to create new column  that join all those key columns values together and use that in your lookupRecord, For example I would create a column called CompanyID_FiscalYear to join both values and use this column as my lookup key. In the dynamic property key of the LookupRecord you can use the concat function to concatenate values from different path:

https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html#concat

 

If that helps please accept solution.

Thanks

View solution in original post

6 REPLIES 6

avatar
Rising Star

Dear @VidyaSargur, does anyone can help this issue? Thank you so much, much appreciated!!

avatar

Hi @jarviszzzz ,

The lookupRecord processor is not suppose to work in a chain  where the next lookup will filter based on the result of the previous one. Each LookupRecord is independent call to the database and it will return a result from the whole dataset based on the specified key. If you want to lookup record by multiple keys then you probably need to create new column  that join all those key columns values together and use that in your lookupRecord, For example I would create a column called CompanyID_FiscalYear to join both values and use this column as my lookup key. In the dynamic property key of the LookupRecord you can use the concat function to concatenate values from different path:

https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html#concat

 

If that helps please accept solution.

Thanks

avatar
New Contributor

I also had the same issue here and I’m new to Apache NiFi. Just curious does this way of concatenation a well accepted method for multi-column lookup in NiFi community? I’m asking because because it does not sounds efficient from database perspective where you use more storage space. And then you need to index this concatenated column for lookup speed.

Another consideration is that should we use stored procedure in database (MySQL for example) to do this kind of multi-column lookup? In this way there is no additional column created. But in this way I don’t know how to send these results from stored procedure back to my NiFi flow for subsequent processing.

Thanks and looking forward to your reply!

avatar
Rising Star

Dear @SAMSAL ,

I also have the same concern with @BigJames , Is concatenation the most logical approach in this context? Additionally, how should we handle NULL values? I assume that concatenating NULL values may not be feasible.

Thank you so much in advance!

avatar

@jarviszzzz, @BigJames ,

I understand your concern specially when you have too many columns to check against and large amount of records where indexing can make big difference in response time. It seems that the LookupRecord is intended for simpler cases where you are checking against one key that is not  nullable. For your case I would suggest two options:

 

1- You can use ExecuteSQL\ExecuteSQLRecord instead of LookupRecord. To use this option you need to extract all data used to filter, update or insert into flowfile attributes since the content will change after ExecuteSQL. Using this processor  you can specify the select statement ( in the SQL Select Query property ) with where clause that checks against all fields. The select property allows Expression Language which means you can use flowfile attributes to populate values dynamically. Using SQL and EL also you can handle null values accordingly. Now to check if there is a match or not , you can use simple RouteOnAttribute processor with one condition (dynamic property) :

IsRecordFound: ${fileSize:gt(0)}

the "fileSize" is a system attribute that gives you the size of the flowfile content, If it's 0 then there is no content return which mean no match , otherwise there is a match.

 

2- Defer checking for match (update) and no match (insert) to SQL . I like this approach because you are doing one trip to sql instead of two. This approach also requires you to extract all data into flowfile attributes. Basically you create stored proc where you pass record data as parameters ( or json if your sql can parse json). You can run this stored proc using PutSQL processor. The SQL statement property allows for Expression Language which mean you can pass stored proc parameter values dynamically. Inside the stored proc you can check if the record exist or not and do the proper action. One thing you have to be careful with using this approach is using asynchronous calls to the store proc with multi threading or on a cluster as this might end up in sql in sql deadlock exception which you can retry within nifi or the stored proc itself.

 

I'm looping other experts  like @MattWho , @cotopaul  to see if they can provide other input as well.

Thanks

 

avatar
Rising Star

Dear @SAMSAL ,

Thank you so much for helping, we tried different ways, and we decided to use the solution that you mention above by concatenating those columns.