Support Questions

Find answers, ask questions, and share your expertise

reject invalid csv files

avatar
Explorer

I need to create a flow that rejects csv files if there are invalid records determined using ValidateRecord processor. I would like to just report those invalid records and stop it there and  do not want to process the valid records.

 

Could someone please help with the flow

1 ACCEPTED SOLUTION

avatar

@nuxeo-nifi,

What I would try to implement as a quick solution is:
1. Configure your ValidateRecord (or even maybe try ValidateCSV) so it identifies when your Records from your CSV are not valid.


2. From ValidateRecord, you have 3 possible queues:

  • failure --> which you might want connect to an alert system, like PutEmail for example.
  • valid --> which you might want to connect to your further processing.
  • invalid --> what you are actually looking for :). Here, you can use an InvokeHTTP to call NiFi's REST API and stop your ValidateRecord Processor. In this way, if a single message was rejected, your entire flow will be stopped... this is actually not the best way to do things but if this is your project requirement, this is what you should do.

 

2a. From ValidateCSV, you have 2 possible queues:

  • valid --> which you might want to connect to your further processing.
  • invalid --> what you are actually looking for :). Here, you can use an InvokeHTTP to call NiFi's REST API and stop your ValidateCSV Processor. In this way, if a single message was rejected, your entire flow will be stopped... this is actually not the best way to do things but if this is your project requirement, this is what you should do.


3. If you are using this Flow in a so to say Streaming mode (you get files every second), you should modify ValidateRecord to run every 5 seconds or every 2 seconds (or something like that) so you have time to stop your processor using InvokeHTTP. If you leave it by default on Run Schedule 0 sec, you will process some additional messages before being able to stop your processor.

Documentation:
NiFi Rest API: https://nifi.apache.org/docs/nifi-docs/rest-api/index.html

NiFi ValidateRecord: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apach...

NiFi InvokeHTTP: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apach...
NiFi ValidateCSV: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apach...

How To ValidateRecord: https://community.cloudera.com/t5/Community-Articles/Using-Apache-NiFi-to-Validate-that-Records-Adhe...

 

 

View solution in original post

1 REPLY 1

avatar

@nuxeo-nifi,

What I would try to implement as a quick solution is:
1. Configure your ValidateRecord (or even maybe try ValidateCSV) so it identifies when your Records from your CSV are not valid.


2. From ValidateRecord, you have 3 possible queues:

  • failure --> which you might want connect to an alert system, like PutEmail for example.
  • valid --> which you might want to connect to your further processing.
  • invalid --> what you are actually looking for :). Here, you can use an InvokeHTTP to call NiFi's REST API and stop your ValidateRecord Processor. In this way, if a single message was rejected, your entire flow will be stopped... this is actually not the best way to do things but if this is your project requirement, this is what you should do.

 

2a. From ValidateCSV, you have 2 possible queues:

  • valid --> which you might want to connect to your further processing.
  • invalid --> what you are actually looking for :). Here, you can use an InvokeHTTP to call NiFi's REST API and stop your ValidateCSV Processor. In this way, if a single message was rejected, your entire flow will be stopped... this is actually not the best way to do things but if this is your project requirement, this is what you should do.


3. If you are using this Flow in a so to say Streaming mode (you get files every second), you should modify ValidateRecord to run every 5 seconds or every 2 seconds (or something like that) so you have time to stop your processor using InvokeHTTP. If you leave it by default on Run Schedule 0 sec, you will process some additional messages before being able to stop your processor.

Documentation:
NiFi Rest API: https://nifi.apache.org/docs/nifi-docs/rest-api/index.html

NiFi ValidateRecord: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apach...

NiFi InvokeHTTP: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apach...
NiFi ValidateCSV: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apach...

How To ValidateRecord: https://community.cloudera.com/t5/Community-Articles/Using-Apache-NiFi-to-Validate-that-Records-Adhe...