Support Questions

Find answers, ask questions, and share your expertise

Enrich flowfile with in memory look-up dataset

Rising Star

I have a flow file that I want to enrich with the contents from a list of JSON values based on the "id".  Basically perform an inner join of the flow file with this look-up data set.  However, I need the look-up dataset to be in memory, then using the "id", query the look-up dataset and append the results to the current flow file.  Here is an example:

 

Incoming flow file:

 

{
   "id": "abc123",
   "fname": "The",
   "lname": "Rock"
}

 

 

Contents of the look-up data set:

 

[
   {
     "id": "abc123",
     "dob": "03/09/1977",
     "phone": "987-654-0001"
   },
   {
     "id": "def765",
     "dob": "04/08/1976",
     "phone": "789-654-0001"
   },
   {
     "id": "hij765",
     "dob": "05/06/1975",
     "phone": "456-654-0001"
   }
]

 

 

Enriched flow file:

{
   "id": "abc123",
   "fname": "The",
   "lname": "Rock",
   "dob": "03/09/1977",
   "phone": "987-654-0001"
}

 

I need to be able to look up the correct record in the look-up dataset based on the "id" then append the values to the current flow file.

 

The key here is that I need the look-up dataset to reside in memory (can't be a file or a database)

 

Thanks for reading and look forward to hearing back with ideas.

1 ACCEPTED SOLUTION

Super Collaborator

If you are using Nifi 1.16 or higher I would also refer you  to the ForEnrichment & JoinEnrichment processors that can help you with what you are trying to do. I think you can use those processor regardless if you read the lookup dataset  directly from HTTP or after you loaded into the DistributedMapCache:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apach...

Hope that helps.

View solution in original post

4 REPLIES 4

Super Collaborator

Hi,

Before trying to answer your question I'm trying to understand how are you planning to populate the lookup dataset in "memory" ? Are you thinking of doing it manually in case you have limited number of lookups as you would do with "SimpleKeyValueLookupService" or you just want to read it once from a file\db and populate it in some lookup service like "DistributedMapCacheLookupService" ?

 

Rising Star

Great questions @SAMSAL 

I would like to do something like storing the JSON in a parameter contexts variable then using a Lookup service to retrieve the corresponding record.

I think of it as an in-memory table that I can use to perform inner joins with flow files.

 

Alternatively, I could read the table from a remote source (using InvokeHTTP) and then load it into a DistributedMapCacheLookupService, but I'm not familiar with this approach so I'd have to do some research. 

 

I appreciate your time.  Thank you.

Super Collaborator

If you are using Nifi 1.16 or higher I would also refer you  to the ForEnrichment & JoinEnrichment processors that can help you with what you are trying to do. I think you can use those processor regardless if you read the lookup dataset  directly from HTTP or after you loaded into the DistributedMapCache:

https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apach...

Hope that helps.

Rising Star

Thank you @SAMSAL  I will investigate these processors; I had no idea they existed.  Thank you!