<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Enrich flowfile with in memory look-up dataset in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362697#M238828</link>
    <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp; I will investigate these processors; I had no idea they existed.&amp;nbsp; Thank you!&lt;/P&gt;</description>
    <pubDate>Tue, 31 Jan 2023 19:01:19 GMT</pubDate>
    <dc:creator>ChuckE</dc:creator>
    <dc:date>2023-01-31T19:01:19Z</dc:date>
    <item>
      <title>Enrich flowfile with in memory look-up dataset</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362600#M238810</link>
      <description>&lt;P&gt;I have a flow file that I want to enrich with the contents from a list of JSON values based on the "id".&amp;nbsp; Basically perform an inner join of the flow file with this look-up data set.&amp;nbsp; However, I need the look-up dataset to be in memory, then using the "id", query the look-up dataset and append the results to the current flow file.&amp;nbsp; Here is an example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Incoming flow file:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;{
   "id": "abc123",
   "fname": "The",
   "lname": "Rock"
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Contents of the look-up data set:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;[
   {
     "id": "abc123",
     "dob": "03/09/1977",
     "phone": "987-654-0001"
   },
   {
     "id": "def765",
     "dob": "04/08/1976",
     "phone": "789-654-0001"
   },
   {
     "id": "hij765",
     "dob": "05/06/1975",
     "phone": "456-654-0001"
   }
]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Enriched flow file:&lt;/P&gt;&lt;LI-CODE lang="javascript"&gt;{
   "id": "abc123",
   "fname": "The",
   "lname": "Rock",
   "dob": "03/09/1977",
   "phone": "987-654-0001"
}&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I need to be able to look up the correct record in the look-up dataset based on the "id" then append the values to the current flow file.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The key here is that I need the look-up dataset to reside in memory (can't be a file or a database)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks for reading and look forward to hearing back with ideas.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2023 05:59:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362600#M238810</guid>
      <dc:creator>ChuckE</dc:creator>
      <dc:date>2023-01-31T05:59:18Z</dc:date>
    </item>
    <item>
      <title>Re: Enrich flowfile with in memory look-up dataset</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362686#M238824</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;Before trying to answer your question I'm trying to understand how are you planning to populate the lookup dataset in "memory" ? Are you thinking of doing it manually in case you have limited number of lookups as you would do with "&lt;SPAN&gt;SimpleKeyValueLookupService&lt;/SPAN&gt;" or you just want to read it once from a file\db and populate it in some lookup service like "&lt;SPAN&gt;DistributedMapCacheLookupService&lt;/SPAN&gt;" ?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2023 16:04:40 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362686#M238824</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2023-01-31T16:04:40Z</dc:date>
    </item>
    <item>
      <title>Re: Enrich flowfile with in memory look-up dataset</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362695#M238826</link>
      <description>&lt;P&gt;Great questions&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would like to do something like storing the JSON in a parameter contexts variable then using a Lookup service to retrieve the corresponding record.&lt;/P&gt;&lt;P&gt;I think of it as an in-memory table that I can use to perform inner joins with flow files.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Alternatively, I could read the table from a remote source (using InvokeHTTP) and then load it into a DistributedMapCacheLookupService, but I'm not familiar with this approach so I'd have to do some research.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I appreciate your time.&amp;nbsp; Thank you.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2023 17:56:53 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362695#M238826</guid>
      <dc:creator>ChuckE</dc:creator>
      <dc:date>2023-01-31T17:56:53Z</dc:date>
    </item>
    <item>
      <title>Re: Enrich flowfile with in memory look-up dataset</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362696#M238827</link>
      <description>&lt;P&gt;If you are using Nifi 1.16 or higher I would also refer you&amp;nbsp; to the ForEnrichment &amp;amp; JoinEnrichment processors that can help you with what you are trying to do. I think you can use those processor regardless if you read the lookup dataset&amp;nbsp; directly from HTTP or after you loaded into the DistributedMapCache:&lt;/P&gt;&lt;P&gt;&lt;A href="https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apache.nifi.processors.standard.JoinEnrichment/additionalDetails.html" target="_blank"&gt;https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.12.1/org.apache.nifi.processors.standard.JoinEnrichment/additionalDetails.html&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Hope that helps.&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2023 18:16:52 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362696#M238827</guid>
      <dc:creator>SAMSAL</dc:creator>
      <dc:date>2023-01-31T18:16:52Z</dc:date>
    </item>
    <item>
      <title>Re: Enrich flowfile with in memory look-up dataset</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362697#M238828</link>
      <description>&lt;P&gt;Thank you&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp; I will investigate these processors; I had no idea they existed.&amp;nbsp; Thank you!&lt;/P&gt;</description>
      <pubDate>Tue, 31 Jan 2023 19:01:19 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/362697#M238828</guid>
      <dc:creator>ChuckE</dc:creator>
      <dc:date>2023-01-31T19:01:19Z</dc:date>
    </item>
    <item>
      <title>Re: Enrich flowfile with in memory look-up dataset</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/375904#M242701</link>
      <description>&lt;P&gt;Below is a Python script that demonstrates this process using your example&lt;/P&gt;
&lt;P&gt;import json&lt;/P&gt;
&lt;P&gt;# Define the lookup dataset in memory as a list of dictionaries&lt;BR /&gt;lookup_data = [&lt;BR /&gt;{&lt;BR /&gt;"id": "abc123",&lt;BR /&gt;"dob": "03/09/1977",&lt;BR /&gt;"phone": "987-654-0001"&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"id": "def765",&lt;BR /&gt;"dob": "04/08/1976",&lt;BR /&gt;"phone": "789-654-0001"&lt;BR /&gt;},&lt;BR /&gt;{&lt;BR /&gt;"id": "hij765",&lt;BR /&gt;"dob": "05/06/1975",&lt;BR /&gt;"phone": "456-654-0001"&lt;BR /&gt;}&lt;BR /&gt;]&lt;/P&gt;
&lt;P&gt;# Function to perform the enrichment&lt;BR /&gt;def enrich_flow_file(flow_file):&lt;BR /&gt;try:&lt;BR /&gt;flow_data = json.loads(flow_file)&lt;/P&gt;
&lt;P&gt;# Perform a lookup based on the "id" field&lt;BR /&gt;for item in lookup_data:&lt;BR /&gt;if item["id"] == flow_data["id"]:&lt;BR /&gt;# Merge the data into the flow_file&lt;BR /&gt;flow_data.update(item)&lt;BR /&gt;enriched_flow_file = json.dumps(flow_data)&lt;BR /&gt;return enriched_flow_file&lt;/P&gt;
&lt;P&gt;except Exception as e:&lt;BR /&gt;# Handle any exceptions that may occur during processing&lt;BR /&gt;return str(e)&lt;/P&gt;
&lt;P&gt;# Example usage:&lt;BR /&gt;incoming_flow_file = '''&lt;BR /&gt;{&lt;BR /&gt;"id": "abc123",&lt;BR /&gt;"fname": "The",&lt;BR /&gt;"lname": "Rock"&lt;BR /&gt;}&lt;BR /&gt;'''&lt;/P&gt;
&lt;P&gt;enriched_flow_file = enrich_flow_file(incoming_flow_file)&lt;BR /&gt;print(enriched_flow_file)&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;This script defines the lookup dataset in memory as a list of dictionaries and defines a function enrich_flow_file that takes an incoming flow file as input, performs the lookup based on the "id" field, and appends the matched data to the flow file. You can adapt this script to your specific use case and integrate it into your data processing pipeline.&lt;/P&gt;</description>
      <pubDate>Fri, 08 Sep 2023 18:30:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/375904#M242701</guid>
      <dc:creator>FlorianHaas</dc:creator>
      <dc:date>2023-09-08T18:30:20Z</dc:date>
    </item>
    <item>
      <title>Re: Enrich flowfile with in memory look-up dataset</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/393196#M248368</link>
      <description>&lt;P&gt;can I keep the JSON file as a constatnts in nifi itself? so that I can avoid the invokeHTTP call. every time I received a flow data, I just need to check the id from the flow data and using that i need to find the correspondent entry from the conastant json file and append those in to the flow data.&lt;/P&gt;</description>
      <pubDate>Mon, 09 Sep 2024 03:38:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Enrich-flowfile-with-in-memory-look-up-dataset/m-p/393196#M248368</guid>
      <dc:creator>Rawther</dc:creator>
      <dc:date>2024-09-09T03:38:35Z</dc:date>
    </item>
  </channel>
</rss>

