About steven-matison

steven-matison · ‎06-01-2022

@leandrolinof I believe you are looking for a brand new nifi feature found in 1.16 which allows you to control failure and retry: Framework Level Retry now supported. For many years users build flows in various ways to make retries happen for a configured number of attempts. Now this is easily and cleanly configured in the UI/API and simplifies the user experience and flow design considerably! To those waiting for years for this thank you for your patience. Reference: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.16.0 You can find more about whats new in NiFi 1.16 in this video below. https://www.youtube.com/watch?v=8G6niPKntTc Mark also shows a bit of the new retry mechanism around 11:50

steven-matison · ‎05-31-2022

@dfdf as the error suggests: You need to install the mysql connector. I believe this link will get you there: https://docs.cloudera.com/csa/1.3.0/installation/topics/csa-ssb-configuring-mysql.html#ariaid-title3

steven-matison · ‎05-24-2022

@FediMannoubi Below is a basic approach to solve. Assuming both postgres tables are populated with rows per your example, your nifi flow would need to get the CSV (various ways to do that), once the contents of the csv are in a flowfile (i use GenerateFlowFile processor), you can use a RecordReader based processor to read the csv. This will allow you to write SQL against the flowfile with QueryRecord to get a single value. For example: SELECT city_name FROM FLOWFILE Next, in your flow you will need to get the city_name value into an attribute, i use EvaluateJsonPath. After that a ExecuteSQL processor and associated DBCP Connection pool to postgres. Then in ExecuteSQL your query is SELECT city_id FROM CITY WHERE city_name=${city_name} At the end of this flow you will have the city_name from csv, and city_id from postgres. You can now combine or use the further downstream to suit your needs. INSERT is done similarly, once you have the data in flowfiles, or attributes, using the same ExecuteSQL you write an insert instead. My test flow looks like this, but forgive the end, as I did not actually have a postgres database setup. You can find this sample flow [here]. I hope this gets you pointed in the right direction for reading csv and querying data from database.

steven-matison · ‎05-17-2022

Nice one sir!

steven-matison · ‎05-17-2022

@joshtheflame CDP Private Cloud Base, for on prem, is able to be deployed on openshift kubernetes. CDP Public Cloud, in Aws, Azure, or GCP is fully kubernetes deployed in the respective cloud kubernetes platforms. CDP is Hybrid and Multi-Cloud capable as well. Check out CDP Private Cloud Base: https://docs.cloudera.com/data-warehouse/1.3.1/openshift-environments/topics/dw-private-cloud-openshift-environments-overview.html and CDP Public Cloud: https://docs.cloudera.com/cdp/latest/overview/topics/cdp-overview.html

steven-matison · ‎05-16-2022

@IsaacKamalesh yes. A support request will need to be issued to get the correct networking allowed.

steven-matison · ‎05-03-2022

@Ghilani I believe the solution here is to use record based processors with a JSON RECORD Reader with the schema matching _ names, and a JSON RECORD Writer with schema matching the new names.

steven-matison · ‎04-28-2022

Do not think of the existence number of processors (concurrency) and the run schedule for that process as relating to request/response timing. The request/response time could be almost instant, to as long as your other end takes to respond specifically in reference to InvokeHttp. The number of processors (concurrency) is used to help gain a higher number of unique instances running against that proccessor maybe and usallly to help drain a huge queue of flowfiles (1000s,10000s,1000000s,etc). Run schedule is how long that one instance stays active (able to process more than 1 flowfile in sequence). Hope this helps, Steven

steven-matison · ‎04-28-2022

@Rohan44 maybe you have a typo above, but try: SELECT COUNT(*) FROM FLOWFILE

steven-matison · ‎04-28-2022

@BlueShangai First, HDP is very out dated, i recommend alternative and newer toolsets. That said, in my previous experience with this sandbox, even a 16gb machine struggles to deploy the entire cluster. If it does, there will be stability issues with 1 or more components. When i used a 32gb machine, the stack come sup and is much better stable sandbox. However, this HDP Sandbox is a gaint stack of services that is designed for many machines, so use something like this with caution.

Online	Offline
Last Visited	‎10-15-2025 05:27 AM

Member Since	‎02-01-2022 01:27 PM
Last Visited	‎10-15-2025 05:27 AM
Posts	274
Kudos received	97

Cloudera Community

Re: Nifi - Flow Analysis Rules - Possibility to cr...

Re: Apache Nifi Release 2.0 M1 & M2 High CPU Utili...

Re: error nifi connecting as cluster

Re: Difficulty Sending GraphQL POST Requests Using...

Re: Should i have to restart entire cluster if CM ...

Re: ExecuteSQL - Failure Relationships

Re: install Streaming SQL Console error

Re: NIFI - how to insert distinct data from the fl...

Re: Connecting DBeaver to a CDP Hive Virtual Wareh...

Re: Cloudera Services on Kubernetes

Re: can CDP SAAS have inbound and outbound traffic...

Re: how to replace the underscore "_" at the end o...

Re: InvokeHTTP processor in Nifi, 1 thread: does i...

Re: ScrollElasticsearchHttp processor Count in Nif...

Re: Localhost:1080 refused to connect - Hortonwork...