Member since
11-17-2021
1128
Posts
257
Kudos Received
29
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 2975 | 11-05-2025 10:13 AM | |
| 484 | 10-16-2025 02:45 PM | |
| 1042 | 10-06-2025 01:01 PM | |
| 822 | 09-24-2025 01:51 PM | |
| 629 | 08-04-2025 04:17 PM |
06-10-2024
06:03 PM
@Deejay Welcome to the Cloudera Community! To help you get the best possible solution, I have tagged our NiFi expert @steven-matison who may be able to assist you further. Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
... View more
06-08-2024
04:04 AM
1 Kudo
There are no prerequisites required to take any Cloudera certification exam. I recommend you take CCA175 practice test designed by P2PExams to prepare successfully for the exam. This practice test will help you gain essential exam knowledge and understand key concepts.
... View more
06-07-2024
07:31 AM
1 Kudo
@dankh Welcome to the Cloudera Community!
This information has been sent to be updated by the team in charge.
Thank you so much for your contribution!
... View more
06-06-2024
08:02 AM
@G_B NiFi cluster deployments expect that all nodes in the cluster have same hardware specifications. There is no option in NiFi's Load Balanced connections to customize load-balancing based on current CPU load average of some other node. Even doing so would require NiFi nodes to continuously ping all other nodes to get the current load average before sending FlowFiles which would impact performance. The only thing that would result in any form of variation in distribution would be a node receive rate being diminished, but that is out of NiFi's control. Round Robin will skip a node in rotation if the node is unable to receive FlowFiles as fast as another node. Also keep in mind that a NiFi Cluster elects a node the roles "cluster coordinator" and "primary node". Sometimes both roles get assigned to same node. The assignment of these roles can change at. anytime. The primary node is only node that will schedule "primary node" only processors to execute. So your one node lighter on CPU could also end up assigned this role adding to its CPU load average. Often CPU load average is not only impacted by volume, but also content size of the FlowFiles. The LB connections also do not take in to account FlowFile content size when distributing FlowFiles. While your best option here performance wise is to make sure all nodes have same hardware specifications, there are a few less performant options you could try to distribute your data differently. 1. Use Remote Process Group (RPG) which uses Site-To-SIte (S2S) to distribute FlowFiles across your NiFi nodes. Always recommend using RPG to push to a Remote Input port rather then pull from an Remote output port to achieve better load distribution. Issue here is you need to add RPGs and Remote ports everywhere you were previously using LB configured connections. 2. Build a smart data distribution reusable dataflow. You could build a data flow that sorts FlowFiles by their content size ranges, merges bundles via mergeContent using FlowFile Stream, v3 merge format, send bundles based on size ranges to your various nodes via invokeHTTP to listenHTTP, and then unpackContent once received to extract the FlowFile bundle. This mergeContent is going to add addition cpu load. 3. Consider using DistributeLoad (can be configured with weighted distribution allowing you to create three distribution relationships with maybe like 5 FlowFile per relationship 1 and 2, and relationship with only 1 per iteration. This allows you to send 1 to you lower core node for every 5 sent to other two nodes. You would still need to use updateAttribute (set custom target node URL), mergeContent, invokeHttp, ListenHTTP, and unpackContent in this flow. So if addressing your hardware differences is not option, Number 1 is probably your next best choice. Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped. Thank you, Matt
... View more
06-06-2024
02:44 AM
1 Kudo
@Thar11027 yes, but it will be more complex: you can put a shift operation and using the "*" wildcard. So if for example you do this: [
{
"operation": "shift",
"spec": {
"*": {
"*/*": "[&1].&", //with a '/' in the middle somewhere
"/*":"[&1].&", //with a '/' at the start followed by something
"*/":"[&1].&" //with a '/' at the end following something
}
}
}
] You will obtain all the desired fields, so it will be more easy to do the manual substitution(try it on the jolt demo site). Know that if you want to make it full automated, it will be a little more difficult, because then you would have to manipulate the string of the field. If you are interested in that I really suggest you to look at the last example on the guide I already sent you (My Guide). (I don't know if maybe it will be more appropriate to open another question about this other problem, because the topic changed and maybe if someone with the same problem is searching for it, it can be found)
... View more
06-03-2024
03:49 PM
1 Kudo
@sibin Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
05-31-2024
02:17 PM
1 Kudo
@adsejnf Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
05-27-2024
08:39 PM
Hi had checked the "show processlist" and this query is running there and multiple time it's running problem is that it is scanning all rows, means not in optimize state and if it is auto generated query then how we can pass the partition information. and this query is not run by user, they are running optimize query it seems it taking some metadata from mysql, something like below..... So that's why i was thinking it meta generated query. | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+----------------------------------+-----------------+---------+-------------+---------+----------+-------------+ | 1 | SIMPLE | C0 | NULL | const | PRIMARY,UNIQUE_DATABASE,CTLG_FK1 | UNIQUE_DATABASE | 389 | const,const | 1 | 100.00 | Using index | | 1 | SIMPLE | B0 | NULL | const | PRIMARY,UNIQUETABLE,TBLS_N49 | UNIQUETABLE | 268 | const,const | 1 | 100.00 | Using index | | 1 | SIMPLE | A0 | NULL | ref | PARTITIONS_N49 | PARTITIONS_N49 | 9 | const | 5555098 | 11.11 | Using where
... View more
05-24-2024
01:16 PM
@Racketmojster Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
... View more
05-24-2024
04:04 AM
1 Kudo
Was a Kerberos issue this is resolved
... View more