Member since
02-01-2022
281
Posts
103
Kudos Received
60
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1120 | 05-15-2025 05:45 AM | |
| 4947 | 06-12-2024 06:43 AM | |
| 7918 | 04-12-2024 06:05 AM | |
| 5815 | 12-07-2023 04:50 AM | |
| 3202 | 12-05-2023 06:22 AM |
05-31-2022
05:48 AM
@dfdf as the error suggests: You need to install the mysql connector. I believe this link will get you there: https://docs.cloudera.com/csa/1.3.0/installation/topics/csa-ssb-configuring-mysql.html#ariaid-title3
... View more
05-24-2022
06:02 AM
1 Kudo
@FediMannoubi Below is a basic approach to solve. Assuming both postgres tables are populated with rows per your example, your nifi flow would need to get the CSV (various ways to do that), once the contents of the csv are in a flowfile (i use GenerateFlowFile processor), you can use a RecordReader based processor to read the csv. This will allow you to write SQL against the flowfile with QueryRecord to get a single value. For example: SELECT city_name FROM FLOWFILE Next, in your flow you will need to get the city_name value into an attribute, i use EvaluateJsonPath. After that a ExecuteSQL processor and associated DBCP Connection pool to postgres. Then in ExecuteSQL your query is SELECT city_id FROM CITY WHERE city_name=${city_name} At the end of this flow you will have the city_name from csv, and city_id from postgres. You can now combine or use the further downstream to suit your needs. INSERT is done similarly, once you have the data in flowfiles, or attributes, using the same ExecuteSQL you write an insert instead. My test flow looks like this, but forgive the end, as I did not actually have a postgres database setup. You can find this sample flow [here]. I hope this gets you pointed in the right direction for reading csv and querying data from database.
... View more
05-17-2022
12:35 PM
Nice one sir!
... View more
05-17-2022
12:34 PM
@joshtheflame CDP Private Cloud Base, for on prem, is able to be deployed on openshift kubernetes. CDP Public Cloud, in Aws, Azure, or GCP is fully kubernetes deployed in the respective cloud kubernetes platforms. CDP is Hybrid and Multi-Cloud capable as well. Check out CDP Private Cloud Base: https://docs.cloudera.com/data-warehouse/1.3.1/openshift-environments/topics/dw-private-cloud-openshift-environments-overview.html and CDP Public Cloud: https://docs.cloudera.com/cdp/latest/overview/topics/cdp-overview.html
... View more
04-28-2022
05:01 AM
Do not think of the existence number of processors (concurrency) and the run schedule for that process as relating to request/response timing. The request/response time could be almost instant, to as long as your other end takes to respond specifically in reference to InvokeHttp. The number of processors (concurrency) is used to help gain a higher number of unique instances running against that proccessor maybe and usallly to help drain a huge queue of flowfiles (1000s,10000s,1000000s,etc). Run schedule is how long that one instance stays active (able to process more than 1 flowfile in sequence). Hope this helps, Steven
... View more
04-26-2022
06:24 AM
2 Kudos
@jonay__reyes I think by default you will see the result you are expecting, however, the expected limit of 5 concurrent connections may be a challenge. Let's address your questions first: Does this translate to simply using 1 InvokeHTTP processor configured to 5 "Concurrent Tasks" and that's it? - 1 proc w/ 5 concurrent tasks, will provide what is in effect 5 instance copies and they can run more than 5 requests each if there are ample flowfiles queued up. So, NO. For your use case, i would recommend that you set it to 1, and control the # of flowfiles upstream. Will the processor wait for the remote endpoint's request before sending the next one? YES if concurrent task set to 1. NO, if set higher (2+) they will execute in parallel How does the "Run Schedule" works together with the previous settings? (if I had, e.g.: 1 sec). Run Schedule sets how long a process will operate before a new instance is necessary. If the request/response times are low, this setting will allow you to push more data through each instance without creating separate processes for each. If the request/response time is high, you can use this to help with long execution. Experiment carefully here. I've been proposed with splitting the incoming queue and put 5 InvokeHTTP processors in parallel, each one attending 1/5 of the incoming flowfiles (I'd do the pre-partitioning before with some RouteOnAttribute trick), but I think it's exactly the same outcome as the 1. above. Is it? Correct, there is no reason to do this, avoid duplicating processors For concurrent tasks and run schedule adjustments, you should always experiment in small increments, changing one setting at a time, evaluating, and repeating until you find the right balance. I suspect that you will not need 5 long executing request/responses in parallel, and that even with default settings, your queued flowfiles will execute fast enough to appear "simultaneous".
... View more
04-26-2022
06:06 AM
@Tokolosk I would recommend that you provide java's cacerts as the keystore/truststore with the default password (changeit). This will work for most public certs. If you have a custom cert at the SFTP end you will need to import that cert and create your own keystore/truststore. I always try cacerts first.
... View more
04-07-2022
05:15 AM
2 Kudos
@krishna123 this is how nifi is supposed to work. If there is no input, even though the processor is Play/Green, it is not doing anything. It is only "running" if there is input.
... View more
04-06-2022
05:54 AM
@krishna123 NiFi data flows are expected to be operated in an always on capacity. If no input is arriving, the processor is not doing anything, there is no reason to stop it. Can you explain in more detail why you want it actually stopped?
... View more
04-06-2022
05:37 AM
Ahh that certainly is a different challenge which would require a slightly different approach. My best recommendation, outside of making your own processors, would be a combination of api calls to prepare variables for a processor group that uses listsftp and fetchsftp or getsftp once the variables are setup.
... View more