Created 01-24-2024 10:46 PM
Dear all,
A bit background about this:
I receive fresh data from a data vendor on a weekly basis and aim to either insert or update my existing database. If the lookup identifies matching values (such as company ID, fiscal year, etc.) in a record, we perform an update on specific fields (e.g., announcement date); otherwise, we insert a new record.
My chain of lookups:
My lookup processor setup:
My data:
Before lookup:
After first lookup (Company ID) (It return me fs_ID):
After second lookup (Fiscal_Year) (It return me another fs_ID):
Note: fs_ID is retrived value from lookup tables
However, I check my lookup table:
It does not match the "Company_ID", it only match the "Fiscal_year", It seems like those lookup processor worked independently. I expected this output: myincomingdata.Company_ID = ent_fundamentals.Company_ID AND myincomingdata.Fiscal_year = ent_fundamentals.Fiscal_year..etc., (I can update this record base on those columns if they all match)
Can anyone please help me this issue or is there any other way in doing lookup inNIFI?
Created 01-26-2024 06:43 AM
Hi @jarviszzzz ,
The lookupRecord processor is not suppose to work in a chain where the next lookup will filter based on the result of the previous one. Each LookupRecord is independent call to the database and it will return a result from the whole dataset based on the specified key. If you want to lookup record by multiple keys then you probably need to create new column that join all those key columns values together and use that in your lookupRecord, For example I would create a column called CompanyID_FiscalYear to join both values and use this column as my lookup key. In the dynamic property key of the LookupRecord you can use the concat function to concatenate values from different path:
https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html#concat
If that helps please accept solution.
Thanks
Created 01-25-2024 06:26 PM
Dear @VidyaSargur, does anyone can help this issue? Thank you so much, much appreciated!!
Created 01-26-2024 06:43 AM
Hi @jarviszzzz ,
The lookupRecord processor is not suppose to work in a chain where the next lookup will filter based on the result of the previous one. Each LookupRecord is independent call to the database and it will return a result from the whole dataset based on the specified key. If you want to lookup record by multiple keys then you probably need to create new column that join all those key columns values together and use that in your lookupRecord, For example I would create a column called CompanyID_FiscalYear to join both values and use this column as my lookup key. In the dynamic property key of the LookupRecord you can use the concat function to concatenate values from different path:
https://nifi.apache.org/docs/nifi-docs/html/record-path-guide.html#concat
If that helps please accept solution.
Thanks
Created 01-28-2024 12:21 AM
I also had the same issue here and I’m new to Apache NiFi. Just curious does this way of concatenation a well accepted method for multi-column lookup in NiFi community? I’m asking because because it does not sounds efficient from database perspective where you use more storage space. And then you need to index this concatenated column for lookup speed.
Another consideration is that should we use stored procedure in database (MySQL for example) to do this kind of multi-column lookup? In this way there is no additional column created. But in this way I don’t know how to send these results from stored procedure back to my NiFi flow for subsequent processing.
Thanks and looking forward to your reply!
Created on 01-28-2024 07:26 PM - edited 01-28-2024 07:28 PM
Created 01-29-2024 08:11 AM
I understand your concern specially when you have too many columns to check against and large amount of records where indexing can make big difference in response time. It seems that the LookupRecord is intended for simpler cases where you are checking against one key that is not nullable. For your case I would suggest two options:
1- You can use ExecuteSQL\ExecuteSQLRecord instead of LookupRecord. To use this option you need to extract all data used to filter, update or insert into flowfile attributes since the content will change after ExecuteSQL. Using this processor you can specify the select statement ( in the SQL Select Query property ) with where clause that checks against all fields. The select property allows Expression Language which means you can use flowfile attributes to populate values dynamically. Using SQL and EL also you can handle null values accordingly. Now to check if there is a match or not , you can use simple RouteOnAttribute processor with one condition (dynamic property) :
IsRecordFound: ${fileSize:gt(0)}
the "fileSize" is a system attribute that gives you the size of the flowfile content, If it's 0 then there is no content return which mean no match , otherwise there is a match.
2- Defer checking for match (update) and no match (insert) to SQL . I like this approach because you are doing one trip to sql instead of two. This approach also requires you to extract all data into flowfile attributes. Basically you create stored proc where you pass record data as parameters ( or json if your sql can parse json). You can run this stored proc using PutSQL processor. The SQL statement property allows for Expression Language which mean you can pass stored proc parameter values dynamically. Inside the stored proc you can check if the record exist or not and do the proper action. One thing you have to be careful with using this approach is using asynchronous calls to the store proc with multi threading or on a cluster as this might end up in sql in sql deadlock exception which you can retry within nifi or the stored proc itself.
I'm looping other experts like @MattWho , @cotopaul to see if they can provide other input as well.
Thanks
Created 02-25-2024 07:13 PM
Dear @SAMSAL ,
Thank you so much for helping, we tried different ways, and we decided to use the solution that you mention above by concatenating those columns.