Member since
05-28-2024
2
Posts
1
Kudos Received
0
Solutions
05-28-2024
07:00 AM
Thank you so much for your clear response. As I am new to Nifi, my flow looks like the following: My primary concern is to ensure that the entire NiFi flow runs successfully without any errors and that all data extracted from the database table is accurately written to HDFS. The reason behind this concern is that I want to trigger a Spark job using a scheduler such as Apache Airflow immediately after the data ingestion process completes. - Flow Overview - GenerateTableFetch -> ExecuteSQL -> PartitionRecord -> PutHDFS -> PutEmail Specific Concerns: 1. Flow Completion: - Error-Free Execution: How can I ensure that the entire flow runs without any errors from start to finish and if any error occurred I want to be able to see in the logs enventually in the airflow logs? - Verification of Data Written: How can I confirm that all the data extracted from the table is successfully written to HDFS? 2. Integration with Scheduler: - Triggering Spark Job: What are the best practices to trigger a Spark job in Apache Airflow right after confirming that the data ingestion process has completed successfully? Thank you for your assistance!
... View more
05-28-2024
03:54 AM
1 Kudo
Hello Community, I am working on a data ingestion pipeline using Apache NiFi, and I am using the `ExecuteSQL` processor to read data from a database table and ingest it into my data flow. I want to ensure that all records from the table have been successfully ingested, but I am unsure how to verify this within NiFi. Here are the details of my current setup and the specific challenges I am facing: Current Setup: 1. Database Source: I am querying a relational database. 2. NiFi Processor: I am using the `ExecuteSQL` processor to execute a SQL query and fetch data from the table. 3. Downstream Processors: The flow files generated by `ExecuteSQL` are processed by subsequent processors such as `ConvertRecord`, `PutHDFS`, etc. Specific Challenges: 1. Ensuring Completeness: How can I verify that all rows from the table have been ingested? Is there a mechanism within NiFi to confirm that no records have been missed? 2. Record Counting: Is there a way to count the number of records fetched by `ExecuteSQL` and compare it to the total number of records in the database table to ensure completeness? 3. Error Handling: What are the best practices for handling errors or retries in case the ingestion process is interrupted or fails partway through? 4. Monitoring and Logging: How can I effectively monitor and log the progress of data ingestion to track the number of records ingested vs. the total number of records in the table? Desired Outcome: - A method to compare the record count ingested by NiFi with the actual record count in the source table. - Best practices for setting up error handling and retries in the event of ingestion failures. - Recommendations for monitoring and logging to ensure the completeness and integrity of the data ingestion process. Example Scenario: For example, if my source table contains 100,000 records, I want to be confident that all 100,000 records have been processed and ingested by NiFi. Additionally, if the process fails after ingesting 50,000 records, I would like to know how to handle such situations to ensure data consistency. Any advice, examples, or best practices from the community would be greatly appreciated. Thank you! Best regards, Mohammed NAJB
... View more
Labels:
- Labels:
-
Apache NiFi