Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

HDP- Datanodes are crashing

Explorer

We have 4 nodes Hadoop cluster. 2 Master nodes 2 data nodes After sometimes we found that our data nodes are failing. then, we go and see the log section it always tell cannot allocate memory.

ENV

<code>HDP 2.3.6 VERSION 
HAWQ 2.0.0 VERSION 
linux os : centos 6.0

Getting following error

Data nodes are crashing WITH following logs

<code>os::commit_memory(0x00007fec816ac000, 12288, 0) failed; error='Cannot allocate memory' (errno=12)

Memory Info

vm_overcommit ratio is 2

DATANODE heap size 2 GB

Namenode heap size 2 GB

<code>MemTotal:       30946088 kB
MemFree:        11252496 kB
Buffers:          496376 kB
Cached:         11938144 kB
SwapCached:            0 kB
Active:         15023232 kB
Inactive:        3116316 kB
Active(anon):    5709860 kB
Inactive(anon):   394092 kB
Active(file):    9313372 kB
Inactive(file):  2722224 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      15728636 kB
SwapFree:       15728636 kB
Dirty:               280 kB
Writeback:             0 kB
AnonPages:       5705052 kB
Mapped:           461876 kB
Shmem:            398936 kB
Slab:             803936 kB
SReclaimable:     692240 kB
SUnreclaim:       111696 kB
KernelStack:       33520 kB
PageTables:       342840 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    31201680 kB
Committed_AS:   26896520 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       73516 kB
VmallocChunk:   34359538628 kB
HardwareCorrupted:     0 kB
AnonHugePages:   2887680 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        6132 kB
DirectMap2M:     2091008 kB
DirectMap1G:    29360128 kB
4 REPLIES 4

Super Mentor

@vij singh

Looks like the datanodes might be crashing because of the following setting:

vm_overcommit ratio is 2

Please check the file

/proc/sys/vm/overcommit_memory

- Suggestion: This memory related crash seems to be caused by a system OS setting, the system OS memory overcommit setting is set to 2 (where as it should have been set to 0) as following:

echo 0 > /proc/sys/vm/overcommit_memory

Background of this setting:
Please refer to the following doc to know more about it: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Gui...

overcommit_memoryDefines the conditions that determine whether a large memory request is accepted or denied. There are three possible values for this parameter:

0 — The default setting. The kernel performs heuristic memory overcommit handling by estimating the amount of memory available and failing requests that are blatantly invalid. Unfortunately, since memory is allocated using a heuristic rather than a precise algorithm, this setting can sometimes allow available memory on the system to be overloaded.

1 — The kernel performs no memory overcommit handling. Under this setting, the potential for memory overload is increased, but so is performance for memory-intensive tasks.

2 — The kernel denies requests for memory equal to or larger than the sum of total available swap and the percentage of physical RAM specified in overcommit_ratio. This setting is best if you want a lesser risk of memory overcommitment.

.

Explorer

Thanks for quick reponse. Going to bring overcommit 2 to 0.

Super Mentor

@vij singh

Are you still facing the same issue or making the recommended changes worked?

Contributor

Since you tagged this question with HAWQ, I'm guessing you installed HAWQ on it. One likely reason this is happening is that HAWQ Ambari install will set your datanodes (where HAWQ is installed) to use overcommit of value 2, with a default ratio of 50%, which you're supposed to change based on your memory configs. This ratio should ideally be 90% or more.With 50%, most likely your services didn't get to use half of the datanode RAM.

You will find this config under HAWQ service as a slider control (if overcommit is set to 2). You can either change the overcommit value to 0 on HAWQ Segment Nodes, or set it to 2, will a ratio of 90% or higher. You should update via Ambari instead of direct OS. It is strongly recommended to run HAWQ Master node at least, with overcommit value of 2. You may do this by putting it on a dedicated node and creating a separate config group for HAWQ Master(s). Hope this helps.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.