Member since
02-01-2022
270
Posts
96
Kudos Received
59
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2178 | 06-12-2024 06:43 AM | |
3291 | 04-12-2024 06:05 AM | |
2209 | 12-07-2023 04:50 AM | |
1340 | 12-05-2023 06:22 AM | |
2261 | 11-28-2023 10:54 AM |
06-20-2023
06:42 AM
3 Kudos
@drewski7 This blog is a great place to start: https://blog.cloudera.com/benchmarking-nifi-performance-and-scalability/ That said, some recommendations: Recommend 3 nodes. Use 32 or 64gb ram. Set min ram 16, max 32, let nifi/operating system leverage other 32gb. Add more cores and tune Active Thread Count accordingly Be careful which processors are Primary Only and which processors are not. Do not over loadbalance queues, load balance at top of flow, let nifi distribute work load naturally after that. Tune Processor Concurrency and Run Schedule. Be sure to understand how each work. With a good setup tuned as above, have a plan to identify when time is appropriate to scale horizontally (add more nodes). Here are some more docs that get specific into sizing: https://docs.cloudera.com/cfm/2.1.1/nifi-sizing/topics/cfm-sizing-recommendations.html https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.5.2/nifi-configuration-best-practices/content/configuration-best-practices.html
... View more
06-20-2023
06:31 AM
@Phil_I_AM You should be able to use InvokeHttp to build any REST api calls. https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-standard-nar/1.17.0/org.apache.nifi.processors.standard.InvokeHTTP/index.html The approach I recommend is to have a fully working POSTMAN api call with know url, get/post parameters, and required authentication headers. With this working call and required details, work to duplicate the setup in InvokeHttp until operational.
... View more
06-20-2023
06:23 AM
2 Kudos
@JoseRoque HDP downloads are behind a paywall. Additionally, HDP is no longer supported so I highly recommend that you check out CDP. You can still find a HDP Sandbox in docker: https://www.cloudera.com/tutorials/sandbox-deployment-and-install-guide/3.html Pay attention to top section there: As of January 31, 2021, this tutorial references legacy products that no longer represent Cloudera’s current product offerings. Please visit recommended tutorials: How to Create a CDP Private Cloud Base Development Cluster All Cloudera Data Platform (CDP) related tutorials
... View more
06-16-2023
05:39 AM
@nuxeo-nifi Wanted to first make some suggestions to help us better respond: Include a screen shot of your entire flow Include as much detail as possible about how certain parts are completed. For example: how is the CSV processed. Indicate what you have tried or what you see "toward the end of processing" including details of what you expect. For Example: a single update statement w/ fail and success counts, or insert failures into 1 table and errors into another. Not knowing this, we have to make some assumptions that could possibly result in providing an inaccurate solution or turn the post into long drawn out dialouge, versus simple question, and direct answer/solution. Making those assumptions, I could assume at the bottom of your flow, you have a success and failure relationship. One suggestion would be to use (MergeRecord/MergeContent) to obtain the counts, then maybe replaceText to fabricate the counts into correct shape flowfile and route to an ExecuteSQL processor to execute your SQL statements. Another alternative solution could be to send errors and success to separate ExecuteSQL processors in a way that for each flowfile it just executes a SQL statement that increments the existing count. This would save the need to merge and get totals. Maybe like these in each ExecuteSQL: UPDATE table SET success = success +1 WHERE tablename ='something'
UPDATE table SET errors = errors +1 WHERE tablename ='something'
... View more
06-15-2023
09:36 AM
@Ray82 Yes, you can achieve this with UpdateRecord. You will need to provide record reader/writer with schema of your upstream and downstream. Then in UpdateRecord you explicitly add properties (+) for each record value you want to update versus using a SQL statement like QueryRecord. Here are some useful community posts on this topic: https://community.cloudera.com/t5/Community-Articles/Update-the-Contents-of-FlowFile-by-using-UpdateRecord/ta-p/248267 https://community.cloudera.com/t5/Support-Questions/NiFi-UpdateRecord-processor-is-not-updating-JSON-path/m-p/186256
... View more
06-15-2023
08:54 AM
@MOUROU I recently built a nifi flow in version 1.21 that uses the NiFI API from within nifi, and it is NOT necessary to get access token. From within nifi i am able to just start using the api calls I needed. It would be worth it to see if 1.16 behaves the same way. That flow is here: https://github.com/cldr-steven-matison/NiFi-Templates/blob/main/NiFi_Template_XML_to_Flow_Definition_JSON.json
... View more
06-14-2023
07:10 AM
@Fredb This is a very difficult one to solve. Does anyone know what would cause the execution of the sample_Import_Load.bat to run correctly from the windows command prompt, but fail when executed via the ExecuteStreamCommand processor with these errors? This is most likely caused by permission issues. Nifi requires specific permissions against files and scripts it touches or executes from within processors. As such, the error is saying the processor does not know where any of the resources exist to run that .bat file. I do not have any experience with nifi on windows, other than to avoid it, but the solution is likely the same as other operating systems. Make sure the nifi user has full ownership of the file(s). Additionally, it is sometimes possible to find deeper errors looking at the nifi-app.log file while testing and/or setting the log level of the processor to be more aggressive.
... View more
06-14-2023
07:05 AM
1 Kudo
@rupeshh Docker container are never fun for this and other reasons. I still think you have missing permissions on the file. I cannot see the ownership of the file listing, and i cannot see the path in the error. At any rate, the error definitely suggests the processor does not see the file. One suggestion would be to use the nifi user and cli on docker to ls the directory and files. If that user cannot see the files, that would indicate the same issue the error states (directory path or file does not exist, or not seen due to permissions).
... View more
06-13-2023
12:22 PM
@rupeshh In order for nifi to be able to see the mounted directory or files within it, it needs to be properly owned to the same user that is running nifi. For example: chown nifi:nifi /some/path Then nifi will be able to see the directories and files.
... View more
06-13-2023
06:55 AM
1 Kudo
@wert_1311 Your error indicates that two of your roles are missing or incomplete. 1. Data Access Role (arn:aws:iam::8859X2XX911XX:role/Cloudera-datalake-admin-role) is not set up correctly. Please follow the official documentation on required policies for Data Access Role. Missing policies (chunked): arn:aws:iam::8859X2XX911XX:role/Cloudera-datalake-admin-role:s3:AbortMultipartUpload:arn:aws:s3:::cdp-my-bucket/hive_replica_functions_dir/* 2. Data Access Role (arn:aws:iam::8859X2XX911XX:role/Cloudera-ranger-audit-role) is not set up correctly. Please follow the official documentation on required policies for Data Access Role. Missing policies (chunked): arn:aws:iam::8859X2XX911XX:role/Cloudera-ranger-audit-role:s3:PutObject:arn:aws:s3:::cdp-my-bucket/ranger/audit/* Go back to the quickstart and docs, and make sure completd all the setups. Here is a link with more about the credentials https://docs.cloudera.com/cdp-public-cloud/cloud/requirements-aws/topics/mc-aws-req-credential.html There are steps in the page above describing the 2 roles which have conflicts in your error. The error indicates, in the end of each message, which policies are missing.
... View more