About FlorianHaas

FlorianHaas · ‎05-24-2024

NiFi can be a powerful tool for orchestrating data flows relevant to B2B data building. While it excels at transferring data between CDP instances using processors like 'ListHDFS' and 'PutHDFS,' additional configuration might be needed for handling data deletion specific to B2B sources. For B2B data building, you'll likely be acquiring data from external sources, not HDFS. NiFi's capabilities can still be leveraged, but the specific processors used would depend on the data format and source. However, the challenge of handling data deletion remains. Tools like 'GetHDFSEvents' might not be applicable for B2B data sources.

FlorianHaas · ‎04-29-2024

To retrieve data records after crm enrichment added since the last time you polled an Impala table using a Date/Timestamp column, you can use a SQL query with a WHERE clause filtering for records with timestamps greater than the last timestamp you retrieved. Here's a basic example assuming your timestamp column is named timestamp_column: SELECT * FROM your_table WHERE timestamp_column > 'last_poll_timestamp'; Replace 'last_poll_timestamp' with the actual timestamp value you stored from your last poll. Make sure the timestamp format matches the format stored in your table. Here's a step-by-step guide: Store the timestamp of the last poll in your source system. Use this timestamp to construct your SQL query, ensuring you're retrieving records with timestamps greater than the last poll timestamp. Execute the SQL query against your Impala table to retrieve the new records.

FlorianHaas · ‎09-01-2023

Below is a Python script that demonstrates this process using your example import json # Define the lookup dataset in memory as a list of dictionaries lookup_data = [ { "id": "abc123", "dob": "03/09/1977", "phone": "987-654-0001" }, { "id": "def765", "dob": "04/08/1976", "phone": "789-654-0001" }, { "id": "hij765", "dob": "05/06/1975", "phone": "456-654-0001" } ] # Function to perform the enrichment def enrich_flow_file(flow_file): try: flow_data = json.loads(flow_file) # Perform a lookup based on the "id" field for item in lookup_data: if item["id"] == flow_data["id"]: # Merge the data into the flow_file flow_data.update(item) enriched_flow_file = json.dumps(flow_data) return enriched_flow_file except Exception as e: # Handle any exceptions that may occur during processing return str(e) # Example usage: incoming_flow_file = ''' { "id": "abc123", "fname": "The", "lname": "Rock" } ''' enriched_flow_file = enrich_flow_file(incoming_flow_file) print(enriched_flow_file) This script defines the lookup dataset in memory as a list of dictionaries and defines a function enrich_flow_file that takes an incoming flow file as input, performs the lookup based on the "id" field, and appends the matched data to the flow file. You can adapt this script to your specific use case and integrate it into your data processing pipeline.

FlorianHaas · ‎02-13-2023

Yes, you can achieve the same lead data enrichment using the RestLookupService. The format of the data that the API should return depends on the specific requirements of your use case, but it typically should be in a format that can be easily processed and integrated into your data flow. For example, if you are looking to perform simple key-value lookups, the API could return data in JSON format with a structure similar to the following: { "key1": "value1", "key2": "value2", ... } You can then use the values returned from the API to update or add fields in your data flow by mapping the keys in the API response to fields in your data. It's important to note that the RestLookupService can handle a variety of data formats, including JSON, XML, or even plain text. The choice of format depends on the specific requirements of your use case and the capabilities of the API you are using.

FlorianHaas · ‎02-10-2023

To perform a data lookup using MS SQL as the datasource in NiFi, you can use the LookupRecord processor along with the DBCPConnectionPool and ScriptedLookupService. Here is an outline of the steps you can follow: Create a DBCPConnectionPool service: Go to the NiFi UI and click on the gear icon to access the Controller Services. Click on the "+" button and select "DBCPConnection Pool" Configure the connection pool with the details of your MS SQL database, such as the driver class name, URL, username, password, etc. Create a ScriptedLookupService: In the same Controller Services screen, click on the "+" button and select "ScriptedLookupService". In the ScriptedLookupService, you need to write a Javascript code that will be used to perform the lookup. You can refer to the example code provided in the forum topic How to get DBCP service inside ScriptedLookupService. Configure the LookupRecord processor: Add a LookupRecord processor to your data flow. In the Properties tab, set the "Lookup Service" to the ScriptedLookupService you created in step 2. In the same Properties tab, set the "Connection Pool" to the DBCPConnectionPool you created in step 1. Connect the LookupRecord processor to your data flow, with the input flow file containing the data that you want to enrich.

Online	Offline
Last Visited	‎06-20-2024 10:07 AM

Member Since	‎02-10-2023 01:39 AM
Last Visited	‎06-20-2024 10:07 AM
Posts	5
Kudos received	1

Cloudera Community

Re: CDP Public Cloud : Data Replication

Re: Retrieving Impala data using SQL, only data re...

Re: Enrich flowfile with in memory look-up dataset

Re: Data flow enrichment with NiFi : LookupRecord...

Re: NiFi data enrichment using MS SQL Datasource