Support Questions

Find answers, ask questions, and share your expertise

How to fail the whole pipeline whenever there is a failure in Nifi.?

avatar

Hi, I am getting multiple CSV files as input and in that there is a date column(Joining_Date) in all the files. Requirement is to validate the date as check if the date is in proper format. Pass the files if the records are valid and fail the whole pipeline if any of the record is abd in any of the file.

1 ACCEPTED SOLUTION

avatar
Master Mentor
@rajat puchnanda

-

NiFi is generally designed for independent FlowFile operation. A NiFi processor component is designed to execute against a FlowFile with no regard to other FlowFiles in the flow or coming in to the flow.

-

NiFi did however introduce some processors that can help achieve the logic you are looking for with some limitations. Those processors in NiFi that will allow such logic in a dataflow design would be the Wait and Notify processors.
-
The wait processors is designed to allow FlowFiles to pass only when a release signal is matched. The Notify processor is responsible for setting that release signal.

-

Question 1: You stated that you have multiple CSV files and each of those CSV files contains "date_column". Is that a single line per each incoming CSV file? or is each file a multi-line CSV file where there is a date that needs to validated in the date_column of each line?

-

Question 2: Are you try to fail the just a single file if any one line in it fails to validate against the date_column or are you trying to fail every file if the date column fails to validate in any one of the files?

-

Question 3: Do you know how many files are going to be processed? Even with Wait and Notify processors, you need to be able to tell the NiFi dataflow in this case how many files are expected; otherwise, the flow would have no way of knowing if it is complete and can release the files downstream.

-

The solution here may be multipart, but it all starts with knowing exactly how many files you are dealing with.

Depending on the answers to above questions, you may be able to accomplish this using the below link as a reference only:

https://gist.github.com/ijokarumawak/375915c45071c7cbfddd34d5032c8e90
*** This covers incoming CSV where each CSV contains only a single date_column field that needs to be validated. You must know number of incoming FlowFiles in your batch.

-

or you mat find yourself needing to employ the concepts covered in below link in conjunction with above link:

http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/

*** This scenario covers a two phase wait and notify to handle a scenario where each CSV has multiple lines each with a dat_column to be validated. While it talks about two phase splitting, instead phase one would be based on first link above and second wait/notify loop would be based on splitting each CSV into individual FlowFiles per line to be evaluated.

-

Thank you,

Matt

View solution in original post

2 REPLIES 2

avatar
Master Mentor
@rajat puchnanda

-

NiFi is generally designed for independent FlowFile operation. A NiFi processor component is designed to execute against a FlowFile with no regard to other FlowFiles in the flow or coming in to the flow.

-

NiFi did however introduce some processors that can help achieve the logic you are looking for with some limitations. Those processors in NiFi that will allow such logic in a dataflow design would be the Wait and Notify processors.
-
The wait processors is designed to allow FlowFiles to pass only when a release signal is matched. The Notify processor is responsible for setting that release signal.

-

Question 1: You stated that you have multiple CSV files and each of those CSV files contains "date_column". Is that a single line per each incoming CSV file? or is each file a multi-line CSV file where there is a date that needs to validated in the date_column of each line?

-

Question 2: Are you try to fail the just a single file if any one line in it fails to validate against the date_column or are you trying to fail every file if the date column fails to validate in any one of the files?

-

Question 3: Do you know how many files are going to be processed? Even with Wait and Notify processors, you need to be able to tell the NiFi dataflow in this case how many files are expected; otherwise, the flow would have no way of knowing if it is complete and can release the files downstream.

-

The solution here may be multipart, but it all starts with knowing exactly how many files you are dealing with.

Depending on the answers to above questions, you may be able to accomplish this using the below link as a reference only:

https://gist.github.com/ijokarumawak/375915c45071c7cbfddd34d5032c8e90
*** This covers incoming CSV where each CSV contains only a single date_column field that needs to be validated. You must know number of incoming FlowFiles in your batch.

-

or you mat find yourself needing to employ the concepts covered in below link in conjunction with above link:

http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/

*** This scenario covers a two phase wait and notify to handle a scenario where each CSV has multiple lines each with a dat_column to be validated. While it talks about two phase splitting, instead phase one would be based on first link above and second wait/notify loop would be based on splitting each CSV into individual FlowFiles per line to be evaluated.

-

Thank you,

Matt

avatar
Master Mentor

Conceptually the flow might look something like this:

79475-screen-shot-2018-07-13-at-85736-am.png

Thank you,

Matt

-

When an "Answer" addresses/solves your question, please select "Accept" beneath that answer. This encourages user participation in this forum.