Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

mergecontent processor and default block size of hdfs

avatar
Expert Contributor

Hi All,

Thanks a lot to this awesome community

We have a high event per second data source feeding to mergecontent using listenTCP (15000 event per second), right now I have 3 mergecontent processors in serial, first one 32 MB second one to merge till 64 mb and last one to 128 MB (default block size) , however the message are still getting queued up on the success of listentcp. the settings in the success queue is

100000 message and 2 GB size limit.

should I add 2 more mergecontent in serial to merge 8 mb and then 16 mb? also there are 2 concurrent processes for each mergeocntent processor

I have done all the settings to make our cluster a high perfromance cluster read a blog on this community

https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.h...

any help or suggesttions

Thanks

Dheeru

1 ACCEPTED SOLUTION

avatar
Super Mentor

@dhieru singh

A few things to check...

1. Have you monitored your CPU utilization on your NiFi nodes?
2. If you CPUs are not saturated, what do you have your "Max Timer Driven Thread Count" set to? All your processors must share threads from this resource pool. If the resource poll is to small, Processor threads end up just waiting in line for a thread. So if you have plenty of CPU resources still available, You may want to push this value up. The default is only 10 threads and can be found via "Controller Settings" under the hamburger menu in the upper right corner of the NIFi UI.

40051-screen-shot-2017-10-30-at-42105-pm.png
*** A good rule of thumb staring point for this settings is 2- 4 times the number of cores you have on a single node. This configuration is per node. so if set to 40 and you have 2 nodes, the total thread pool is 80 threads across your 2 node cluster.

3. Adding additional MergeContent processors is not likely to make much difference here. But adding additional Concurrent tasks may help. Just keep in mind the number of FlowFiles (not size) being merged in each bin to avoid heap/Garbage collection issues that will affect performance.

4. Make sure you have sufficient heap memory to run this flow with minimal partial or full garbage collection stop the world events. While young/partial garbage collection is normal and health, old/full garbage collection can have areal affect on performance. Heap memory allocations are set in the nifi bootstrap.conf file.

Thanks,

Matt

View solution in original post

2 REPLIES 2

avatar
Super Mentor

@dhieru singh

A few things to check...

1. Have you monitored your CPU utilization on your NiFi nodes?
2. If you CPUs are not saturated, what do you have your "Max Timer Driven Thread Count" set to? All your processors must share threads from this resource pool. If the resource poll is to small, Processor threads end up just waiting in line for a thread. So if you have plenty of CPU resources still available, You may want to push this value up. The default is only 10 threads and can be found via "Controller Settings" under the hamburger menu in the upper right corner of the NIFi UI.

40051-screen-shot-2017-10-30-at-42105-pm.png
*** A good rule of thumb staring point for this settings is 2- 4 times the number of cores you have on a single node. This configuration is per node. so if set to 40 and you have 2 nodes, the total thread pool is 80 threads across your 2 node cluster.

3. Adding additional MergeContent processors is not likely to make much difference here. But adding additional Concurrent tasks may help. Just keep in mind the number of FlowFiles (not size) being merged in each bin to avoid heap/Garbage collection issues that will affect performance.

4. Make sure you have sufficient heap memory to run this flow with minimal partial or full garbage collection stop the world events. While young/partial garbage collection is normal and health, old/full garbage collection can have areal affect on performance. Heap memory allocations are set in the nifi bootstrap.conf file.

Thanks,

Matt

avatar
Expert Contributor

@Matt Clarke Thanks for the response, yes I set the max timer driven event count to 40. I might have to increase it. I will set the number of flow files to 15000 to 20000 and I shold also set the max age, so that no flow files are lingering on, will comeback with results.

Thanks

Dheeru