Support Questions

Find answers, ask questions, and share your expertise

datanode disks are full because huge files as stdout

avatar

hi all

we saw few scenarios that disks on datanode machine became full 100%

because the files - stdout are huge

for example

/grid/sdb/hadoop/yarn/log/application_151746342014_5807/container_e37_151003535122014_5807_03_000001/stdout

from df -h , we can see

df -h /grid/sdb
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        1.8T  1.8T  0T   100% /grid/sdb

any suggestion how to avoid this situation that stdout are huge and actually this issue cause stopping the HDFS component on the datanode

second: since the PATH of stdout is:

/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout

is it possible to limit the file size? or do a purging of stdout when file reached the threshold ?

Michael-Bronson
5 REPLIES 5

avatar
Master Mentor

@Michael Bronson

What is the value set for "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" property for your yarn?

-> This is the maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0. If the value is greater than or equal to 100, the nodemanager will check for full disk. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs. The default value is 90.0%. Hence either clean up the disk that the unhealthy node is running on, or increase the threshold in yarn-site.xml

Ambari --> YARN -> Configs -> Advanced yarn-site -> Check "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" .

"yarn.nodemanager.log-dirs": It is always best to make sure dedicated disk is allocated you can check the path of the property "yarn.nodemanager.log-dirs" and move it to dedicated disk where enough space is available. This property Determines where the container-logs are stored on the node when the containers are running.


Also please check the property "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" value. It's default value is 0. The minimum space that must be available on a disk for it to be used. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs.

avatar

@Jay the value is yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage=90

Michael-Bronson

avatar

yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb=1000M in my ambari

Michael-Bronson

avatar

@Jay just to mention that , we want to limit the size of stdout or stderr , is it possible ? , lets say for example if we want to limit the size until 1G per file ,

Michael-Bronson

avatar

@Jay any update?

Michael-Bronson