Member since
02-01-2022
274
Posts
97
Kudos Received
60
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
401 | 05-15-2025 05:45 AM | |
3396 | 06-12-2024 06:43 AM | |
5926 | 04-12-2024 06:05 AM | |
4056 | 12-07-2023 04:50 AM | |
2181 | 12-05-2023 06:22 AM |
06-01-2022
07:48 AM
1 Kudo
@leandrolinof I believe you are looking for a brand new nifi feature found in 1.16 which allows you to control failure and retry: Framework Level Retry now supported. For many years users build flows in various ways to make retries happen for a configured number of attempts. Now this is easily and cleanly configured in the UI/API and simplifies the user experience and flow design considerably! To those waiting for years for this thank you for your patience. Reference: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.16.0 You can find more about whats new in NiFi 1.16 in this video below. https://www.youtube.com/watch?v=8G6niPKntTc Mark also shows a bit of the new retry mechanism around 11:50
... View more
05-31-2022
05:48 AM
@dfdf as the error suggests: You need to install the mysql connector. I believe this link will get you there: https://docs.cloudera.com/csa/1.3.0/installation/topics/csa-ssb-configuring-mysql.html#ariaid-title3
... View more
05-24-2022
06:02 AM
1 Kudo
@FediMannoubi Below is a basic approach to solve. Assuming both postgres tables are populated with rows per your example, your nifi flow would need to get the CSV (various ways to do that), once the contents of the csv are in a flowfile (i use GenerateFlowFile processor), you can use a RecordReader based processor to read the csv. This will allow you to write SQL against the flowfile with QueryRecord to get a single value. For example: SELECT city_name FROM FLOWFILE Next, in your flow you will need to get the city_name value into an attribute, i use EvaluateJsonPath. After that a ExecuteSQL processor and associated DBCP Connection pool to postgres. Then in ExecuteSQL your query is SELECT city_id FROM CITY WHERE city_name=${city_name} At the end of this flow you will have the city_name from csv, and city_id from postgres. You can now combine or use the further downstream to suit your needs. INSERT is done similarly, once you have the data in flowfiles, or attributes, using the same ExecuteSQL you write an insert instead. My test flow looks like this, but forgive the end, as I did not actually have a postgres database setup. You can find this sample flow [here]. I hope this gets you pointed in the right direction for reading csv and querying data from database.
... View more
05-17-2022
12:35 PM
Nice one sir!
... View more
05-17-2022
12:34 PM
@joshtheflame CDP Private Cloud Base, for on prem, is able to be deployed on openshift kubernetes. CDP Public Cloud, in Aws, Azure, or GCP is fully kubernetes deployed in the respective cloud kubernetes platforms. CDP is Hybrid and Multi-Cloud capable as well. Check out CDP Private Cloud Base: https://docs.cloudera.com/data-warehouse/1.3.1/openshift-environments/topics/dw-private-cloud-openshift-environments-overview.html and CDP Public Cloud: https://docs.cloudera.com/cdp/latest/overview/topics/cdp-overview.html
... View more
05-16-2022
10:43 AM
@IsaacKamalesh yes. A support request will need to be issued to get the correct networking allowed.
... View more
05-03-2022
12:35 PM
@Ghilani I believe the solution here is to use record based processors with a JSON RECORD Reader with the schema matching _ names, and a JSON RECORD Writer with schema matching the new names.
... View more
04-28-2022
05:01 AM
Do not think of the existence number of processors (concurrency) and the run schedule for that process as relating to request/response timing. The request/response time could be almost instant, to as long as your other end takes to respond specifically in reference to InvokeHttp. The number of processors (concurrency) is used to help gain a higher number of unique instances running against that proccessor maybe and usallly to help drain a huge queue of flowfiles (1000s,10000s,1000000s,etc). Run schedule is how long that one instance stays active (able to process more than 1 flowfile in sequence). Hope this helps, Steven
... View more
04-28-2022
04:54 AM
@Rohan44 maybe you have a typo above, but try: SELECT COUNT(*) FROM FLOWFILE
... View more
04-28-2022
04:51 AM
@BlueShangai First, HDP is very out dated, i recommend alternative and newer toolsets. That said, in my previous experience with this sandbox, even a 16gb machine struggles to deploy the entire cluster. If it does, there will be stability issues with 1 or more components. When i used a 32gb machine, the stack come sup and is much better stable sandbox. However, this HDP Sandbox is a gaint stack of services that is designed for many machines, so use something like this with caution.
... View more