About cotopaul

cotopaul · ‎05-09-2023

@nuxeo-nifi, What I would try to implement as a quick solution is: 1. Configure your ValidateRecord (or even maybe try ValidateCSV) so it identifies when your Records from your CSV are not valid. 2. From ValidateRecord, you have 3 possible queues: failure --> which you might want connect to an alert system, like PutEmail for example. valid --> which you might want to connect to your further processing. invalid --> what you are actually looking for :). Here, you can use an InvokeHTTP to call NiFi's REST API and stop your ValidateRecord Processor. In this way, if a single message was rejected, your entire flow will be stopped... this is actually not the best way to do things but if this is your project requirement, this is what you should do. 2a. From ValidateCSV, you have 2 possible queues: valid --> which you might want to connect to your further processing. invalid --> what you are actually looking for :). Here, you can use an InvokeHTTP to call NiFi's REST API and stop your ValidateCSV Processor. In this way, if a single message was rejected, your entire flow will be stopped... this is actually not the best way to do things but if this is your project requirement, this is what you should do. 3. If you are using this Flow in a so to say Streaming mode (you get files every second), you should modify ValidateRecord to run every 5 seconds or every 2 seconds (or something like that) so you have time to stop your processor using InvokeHTTP. If you leave it by default on Run Schedule 0 sec, you will process some additional messages before being able to stop your processor. Documentation: NiFi Rest API: https://nifi.apache.org/docs/nifi-docs/rest-api/index.html NiFi ValidateRecord: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apache.nifi.processors.standard.ValidateRecord/index.html NiFi InvokeHTTP: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apache.nifi.processors.standard.InvokeHTTP/index.html NiFi ValidateCSV: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.20.0/org.apache.nifi.processors.standard.ValidateCsv/index.html How To ValidateRecord: https://community.cloudera.com/t5/Community-Articles/Using-Apache-NiFi-to-Validate-that-Records-Adhere-to-a/ta-p/247299

cotopaul · ‎05-09-2023

@nuxeo-nifi, the processors you are referring to do not belong to any NiFi Version (Cloudera or Open-Source), meaning that they were built in house, specially for you and your project. In this case, you would need to speak to those who have developed those processors and identify the application logic. Once you have that, you can use PutMail to send email notifications and InvokeHTTP to do the other actions. I assume that from your processors you have has a failed connection queue, which might be linked to an PutMail Processor, in which you define whatever you want to be send as notification In case of no failures, you can link the success queue out of your nuxeo processor and into InvokeHTTP and perform the call you require. For that, make sure that all your certificates are in place and allow connection between the systems. Otherwise, you won't be able to use InvokeHTTP and you would have to find another solution, like a script.

cotopaul · ‎05-04-2023

You do not install the Cloudera version on your laptop 🙂 You need the Cloudera DataFlow for Public Cloud (CDF-PC), meaning that we are talking here about a license and some services. As @steven-matison already provided you with the perfect answer for your question, he might also be in the position to further assist you with everything you need to know about the Cloudera Data Flow and their Public Cloud. Unfortunately I am still learning about what Cloudera offers and how, so I am not the best one to answer your question. If you are going to use NiFi for some real data processing, I strongly recommend you to have a look to Cloudera Data Flow, as this will solve many issues and headaches 🙂

cotopaul · ‎05-04-2023

@danielhg1285, While the solution provided by @SAMSAL seems to be better for you and more production ready, you could also try the below things. This might work if you are using a stable statement all the time and if are not restricted to see the exact INSERT Statement but rather see the values trying to be inserted. - Shortly after RetryFlowFile, you can add an AttributesToJSON processor and manually define all the columns which you want to insert in the Attributes List Property. Make sure that you use the attribute name from your FlowFile (sql.args.N.value) in your correct order and you set Destination = flowfile-content. In this way, you will generate a JSON File with all the columns and all the values which you have tried to insert but failed. - After AttributesToJSON, you can keep your PutFile to save your file locally on your machine, hence opening it whenever and wherever you want 🙂 PS: This is maybe not the best solution, due to the following reasons, but it will get you started on your track: - You will need to know how many columns you have to insert and each time a new column will be added you will have to modify your AttributesToJSON processor. - You will not get the exact SQL INSERT/UPDATE Statement, but a JSON File containing the column-value pair, which can easily be analyzed by anybody.

DianaTorres · ‎05-02-2023

@acasta Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks

DianaTorres · ‎05-02-2023

@Vas_R Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks

DianaTorres · ‎05-02-2023

@Amit_barnwal Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks

VLban · ‎04-28-2023

another interesting point is how to implement on updateattribute the function of checking whether it was delivered to puthdfs because if hdfs runs out locally, the process continues to go and the files are not written but are thrown out of the queue and go to another file in the basket, in fact, files are lost if the meso runs out and the chain does not stop , you need to check if the file in ndfs did not arrive stop the stream or the memory ran out in ndfs stop putndfs and let the recycle bin fill up

AntonBV · ‎04-27-2023

@cotopaul, Thank you a lot. That is exactly what I need.

jame1997 · ‎04-27-2023

Hello @MattWho @SAMSAL @steven-matison @DigitalPlumber @cotopaul , after confirming the access I'm able to verified I can access the bucket from aws cli on the same system Nifi is running on and download files. The problem I'm having now is that the ListS3 pickup the list of the files from the bucket but the FetchS3 doesn't do anything. When I enabled debugging on the FetchS3 I receiving the following error message. FetchS3 Error message FetchS3 Configuration ListS3 Configuration Any suggestion to what is causing the issue?

Online	Offline
Last Visited	‎03-14-2024 06:37 AM

Member Since	‎01-27-2023 08:25 AM
Last Visited	‎03-14-2024 06:37 AM
Posts	229
Kudos received	73

Cloudera Community

Re: About mergecontent question

Re: how can get the content of Json record and val...

Re: DBCP Connection Pool can't connect to "Progres...

Re: terminate kafka connection if publish kafka pr...

Re: Not able to delete an inifinite loop built wit...

Re: reject invalid csv files

Re: Nuxeo Nifi

Re: Exporting processed groups in licensed version...

Re: Nifi How to save failed sql queries with wrong...

Re: NiFi ListS3 Maximum Object Age

Re: Looking to encrypt data while copying data int...

Re: NiFi using memory more than allocated

Re: Kafka-->Nifi--parquet--->HDFS

Re: NiFi - Extract content from AVRO FlowFile to a...

Re: Issue with S3