About mohammed_najb

MattWho · ‎06-07-2024

@mohammed_najb It is impossible to guarantee a flow will always run error free. You need to plan and design for handling failure. How are you handling the "failure" relationships on your ExecuteSQL and putHDFS processors? The PutHDFS will either be successful or route FlowFile to failure relationship r rollback the session. NiFi does not auto remove FlowFiles. It is responsibility of dataflow designr to handle failures to avoid dataloss. For example, do not auto-terminate any component relationships where FlowFile may get routed. I don't know what would be the "best practice" as that comes with testing. Since you are using GenerateTableFetch processor, it creates attributes on the output FlowFiles. One of which is "fragment.count". You could potentially use this to track that all records are written to HDFS successfully. Look at UpdateAttributes stateful usage options. This would allow you to setup RouteOnAttribute to route last FlowFile once stateful count equals "fragement.count" to a processor that triggers your Spark job. Just a suggestion, but others in the community may have other flow design options. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt

Online	Offline
Last Visited	‎10-23-2024 11:40 AM

Member Since	‎05-28-2024 03:44 AM
Last Visited	‎10-23-2024 11:40 AM
Posts	2
Kudos received	1

Cloudera Community

Re: How to determine if a ExecuteSQL has ingested ...