Member since
02-01-2022
37
Posts
10
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
77 | 06-01-2022 07:48 AM | |
81 | 05-24-2022 06:02 AM | |
125 | 04-26-2022 06:24 AM | |
301 | 04-07-2022 05:15 AM | |
341 | 04-06-2022 05:54 AM |
06-23-2022
09:47 AM
@araujo It would be awesome if you could also link a flow definition file....
... View more
06-13-2022
06:38 AM
1 Kudo
I have done similar here when I need to deliver jar files to all nodes. It's really a "this is not how things are done", but in this case I did not have access to the node's file system without doing this in a flow. So that said, it works great! The first proc creates a flowfile on all nodes (even when I dont know the number), then it checks, if not found, proceeds to get the file and write it to the file system.
... View more
06-07-2022
05:20 AM
Fun with python, you are going to need to resolve all dependencies. I am not familiar with the last error, but its definitely saying psycopg2 is not found..
... View more
06-01-2022
12:29 PM
@Ankscribe Please do not respond on old threads, just make a new one. That said, after putting the custom processor into the right place in nifi, you have to delete work directory and restart nifi. See this comment: https://community.cloudera.com/t5/Support-Questions/Created-a-custom-nifi-processor-after-placing-nar-file-in/m-p/179422/highlight/true#M141668
... View more
06-01-2022
10:17 AM
1 Kudo
Yes, not available before 1.16. Definitely a great new feature!!
... View more
06-01-2022
07:48 AM
1 Kudo
@leandrolinof I believe you are looking for a brand new nifi feature found in 1.16 which allows you to control failure and retry: Framework Level Retry now supported. For many years users build flows in various ways to make retries happen for a configured number of attempts. Now this is easily and cleanly configured in the UI/API and simplifies the user experience and flow design considerably! To those waiting for years for this thank you for your patience. Reference: https://cwiki.apache.org/confluence/display/NIFI/Release+Notes#ReleaseNotes-Version1.16.0 You can find more about whats new in NiFi 1.16 in this video below. https://www.youtube.com/watch?v=8G6niPKntTc Mark also shows a bit of the new retry mechanism around 11:50
... View more
05-31-2022
05:48 AM
@dfdf as the error suggests: You need to install the mysql connector. I believe this link will get you there: https://docs.cloudera.com/csa/1.3.0/installation/topics/csa-ssb-configuring-mysql.html#ariaid-title3
... View more
05-24-2022
06:02 AM
1 Kudo
@FediMannoubi Below is a basic approach to solve. Assuming both postgres tables are populated with rows per your example, your nifi flow would need to get the CSV (various ways to do that), once the contents of the csv are in a flowfile (i use GenerateFlowFile processor), you can use a RecordReader based processor to read the csv. This will allow you to write SQL against the flowfile with QueryRecord to get a single value. For example: SELECT city_name FROM FLOWFILE Next, in your flow you will need to get the city_name value into an attribute, i use EvaluateJsonPath. After that a ExecuteSQL processor and associated DBCP Connection pool to postgres. Then in ExecuteSQL your query is SELECT city_id FROM CITY WHERE city_name=${city_name} At the end of this flow you will have the city_name from csv, and city_id from postgres. You can now combine or use the further downstream to suit your needs. INSERT is done similarly, once you have the data in flowfiles, or attributes, using the same ExecuteSQL you write an insert instead. My test flow looks like this, but forgive the end, as I did not actually have a postgres database setup. You can find this sample flow [here]. I hope this gets you pointed in the right direction for reading csv and querying data from database.
... View more
05-17-2022
12:35 PM
Nice one sir!
... View more
05-17-2022
12:34 PM
@joshtheflame CDP Private Cloud Base, for on prem, is able to be deployed on openshift kubernetes. CDP Public Cloud, in Aws, Azure, or GCP is fully kubernetes deployed in the respective cloud kubernetes platforms. CDP is Hybrid and Multi-Cloud capable as well. Check out CDP Private Cloud Base: https://docs.cloudera.com/data-warehouse/1.3.1/openshift-environments/topics/dw-private-cloud-openshift-environments-overview.html and CDP Public Cloud: https://docs.cloudera.com/cdp/latest/overview/topics/cdp-overview.html
... View more
05-16-2022
10:43 AM
@IsaacKamalesh yes. A support request will need to be issued to get the correct networking allowed.
... View more
05-03-2022
12:35 PM
@Ghilani I believe the solution here is to use record based processors with a JSON RECORD Reader with the schema matching _ names, and a JSON RECORD Writer with schema matching the new names.
... View more
04-28-2022
05:01 AM
Do not think of the existence number of processors (concurrency) and the run schedule for that process as relating to request/response timing. The request/response time could be almost instant, to as long as your other end takes to respond specifically in reference to InvokeHttp. The number of processors (concurrency) is used to help gain a higher number of unique instances running against that proccessor maybe and usallly to help drain a huge queue of flowfiles (1000s,10000s,1000000s,etc). Run schedule is how long that one instance stays active (able to process more than 1 flowfile in sequence). Hope this helps, Steven
... View more
04-28-2022
04:54 AM
@Rohan44 maybe you have a typo above, but try: SELECT COUNT(*) FROM FLOWFILE
... View more
04-28-2022
04:51 AM
@BlueShangai First, HDP is very out dated, i recommend alternative and newer toolsets. That said, in my previous experience with this sandbox, even a 16gb machine struggles to deploy the entire cluster. If it does, there will be stability issues with 1 or more components. When i used a 32gb machine, the stack come sup and is much better stable sandbox. However, this HDP Sandbox is a gaint stack of services that is designed for many machines, so use something like this with caution.
... View more
04-26-2022
06:24 AM
1 Kudo
@jonay__reyes I think by default you will see the result you are expecting, however, the expected limit of 5 concurrent connections may be a challenge. Let's address your questions first: Does this translate to simply using 1 InvokeHTTP processor configured to 5 "Concurrent Tasks" and that's it? - 1 proc w/ 5 concurrent tasks, will provide what is in effect 5 instance copies and they can run more than 5 requests each if there are ample flowfiles queued up. So, NO. For your use case, i would recommend that you set it to 1, and control the # of flowfiles upstream. Will the processor wait for the remote endpoint's request before sending the next one? YES if concurrent task set to 1. NO, if set higher (2+) they will execute in parallel How does the " Run Schedule" works together with the previous settings? (if I had, e.g.: 1 sec). Run Schedule sets how long a process will operate before a new instance is necessary. If the request/response times are low, this setting will allow you to push more data through each instance without creating separate processes for each. If the request/response time is high, you can use this to help with long execution. Experiment carefully here. I've been proposed with splitting the incoming queue and put 5 InvokeHTTP processors in parallel, each one attending 1/5 of the incoming flowfiles (I'd do the pre-partitioning before with some RouteOnAttribute trick), but I think it's exactly the same outcome as the 1. above. Is it? Correct, there is no reason to do this, avoid duplicating processors For concurrent tasks and run schedule adjustments, you should always experiment in small increments, changing one setting at a time, evaluating, and repeating until you find the right balance. I suspect that you will not need 5 long executing request/responses in parallel, and that even with default settings, your queued flowfiles will execute fast enough to appear "simultaneous".
... View more
04-26-2022
06:06 AM
@Tokolosk I would recommend that you provide java's cacerts as the keystore/truststore with the default password (changeit). This will work for most public certs. If you have a custom cert at the SFTP end you will need to import that cert and create your own keystore/truststore. I always try cacerts first.
... View more
04-25-2022
01:31 PM
@Tra The recommend path forward here is to use a JSON Record Reader and JSON Record Writer with schema matching your source (reader) and downstream needs. You want to avoid splitjson and regex matching flow file contents.
... View more
04-08-2022
07:10 AM
@mala_etl Well I cant do all of them, that would be to your values, not mine, but to achieve the move, you need to set: Completion Strategy: Move File Move Destination Directory: The directory to move file to Create Directory: enable true/ disable false Be sure to check the ? for each property, it will explain everything.
... View more
04-07-2022
05:15 AM
@krishna123 this is how nifi is supposed to work. If there is no input, even though the processor is Play/Green, it is not doing anything. It is only "running" if there is input.
... View more
04-06-2022
06:04 AM
1 Kudo
@mala_etl Check out this feature in FetchSFTP: It allows you to move the files into a new directory after fetching them.
... View more
04-06-2022
05:54 AM
@krishna123 NiFi data flows are expected to be operated in an always on capacity. If no input is arriving, the processor is not doing anything, there is no reason to stop it. Can you explain in more detail why you want it actually stopped?
... View more
04-06-2022
05:37 AM
Ahh that certainly is a different challenge which would require a slightly different approach. My best recommendation, outside of making your own processors, would be a combination of api calls to prepare variables for a processor group that uses listsftp and fetchsftp or getsftp once the variables are setup.
... View more
04-05-2022
09:15 AM
1 Kudo
@mbraunerde Before GetSFTP, you need this processor first: ListSFTP ListSFTP will send GetSFTP the the list of files to get. Be sure to search in your NiFi Processor window, SFTP, to see all matching processors. This works for any keyword, and I use it often. Docs here: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.16.0/org.apache.nifi.processors.standard.ListSFTP/index.html
... View more
03-23-2022
02:29 PM
@Nifi_Al Use getFile/putFile to read and write in your flow. Then If you use unpackContent against a zipped flowfile it should create individual artifacts flowfiles which can be additionally unzipped further. Alternatively, if you can create a script to unzip recursively, you can use nifi to executeScript. Maybe even use executeProcess too.
... View more
03-23-2022
02:10 PM
@azg Yes, just add the values you want in the processor configuration tab. Click + to add new key/value pairs.
... View more
03-16-2022
06:30 AM
@bdworld2 Being that HDP is EOL (End of Life) and M1 chip is new and breaking anything with linux amd/64 I would recommend a different learning path. Check out CDP and build new clusters to learn the asf components you are familiar with from HDP.
... View more
03-11-2022
05:24 AM
1 Kudo
@kumsath Yes, when i saw your post I started digging around too. The issue here is number manipulation and type casting a "number" versus a "string". That easy gets complicated in attributes and with nifi expression language functions. The solution i referenced would take the "number" and make a number in which the visual representation should then be correct. Its worth a try since that execute script should be very easy to test. The alternative is to do your "math" upstream or in an execute script processor not in updateattribute.
... View more
03-10-2022
10:52 AM
@RamaS Check out this documentation: https://docs.cloudera.com/cdp-private-cloud-base/7.1.7/replication-manager/topics/rm-dc-backup-and-disaster-recovery.html Cloudera Manager provides an integrated, easy-to-use management solution to enable data protection on the Hadoop platform. Replication Manager enables you to replicate data across data centers for disaster recovery scenarios. Replications can include data stored in HDFS, data stored in Hive tables, Hive metastore data, and Impala metadata (catalog server metadata) associated with Impala tables registered in the Hive metastore. When critical data is stored on HDFS, Cloudera Manager helps to ensure that the data is available at all times, even in case of complete shutdown of a data center.
... View more
03-10-2022
10:47 AM
@kumsath This may not be the best way, but check out this solution here: https://community.cloudera.com/t5/Support-Questions/How-correct-convert-from-Decimal-with-E-to-float-in-Apache/m-p/236511
... View more