Support Questions

yosuke · ‎05-08-2021

Hi,

I'm running a 6-node Hadoop Cluster, which version is HDP 3.1.4, and faced a weird situation so could anyone please help me out.

Checking on Ambari WebUI, NodeManagers which is consisted of 3 nodes seem just fine. However when we push a application into yarn, all the app failed.

Investigating the situation and log files, I found one of the NodeManager node's WebUI not responding and emitting the following log.

WARN org.apache.hadoop.util.SysInfoLinux: Couldn't read /proc/meminfo: can't determine memory settings

By restarting the NodeManager Service, this situation is resolved for a moment but seems reproducible.

Does anyone faced same problem? Any comment is highly appreciated.

Thank you in advance.

Best Regards,

Yosuke

Shelton · ‎05-09-2021

@yosuke

The /proc/meminfo is used by to report the amount of free and used memory (both physical and swap) on the system as well as the shared memory and buffers used by the kernel.

$ cat /proc/meminfo >> /tmp/faultynode.txt

This is related to memory settings can you compare or check Transparent Hugepages (THP) settings between the faulty and the working NM/DN's?
My advice is to check the prepare the Environment steps for your HDP.

Extract from my virtual box

# cat /sys/kernel/mm/transparent_hugepage/enabled
always madvise [never]

Please revert

yosuke · ‎05-11-2021

@Shelton

Thank you for the response.

I haven't checked the settings yet, however we don't configure THP setting to disabled. I googled and found some article saying THP is not for Hadoop workload like below.

https://community.cloudera.com/t5/Community-Articles/OS-Configurations-for-Better-Hadoop-Performance...

I think you pointed out that our NodeManager went wrong due to the some workload happened and our node running NodeManager couldn't deal with it. Is my understanding correct?

I will configure this in our test environment first. Not knowing how this situation happened, I need some time to verify. If any progress or additional query, I will post again.

Thank you very much again.

Best regards,

Yosuke

Shelton · ‎05-11-2021

@yosuke

You should disable Transparent Hugepages (THP) as recommended.

Test thereafter and revert.

Cloudera Community

Support Questions

NodeManager doesn't work properly though it seems OK on Ambari.