Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

datanode disks are full because huge files as stdout

Highlighted

datanode disks are full because huge files as stdout

hi all

we saw few scenarios that disks on datanode machine became full 100%

because the files - stdout are huge

for example

/grid/sdb/hadoop/yarn/log/application_151746342014_5807/container_e37_151003535122014_5807_03_000001/stdout

from df -h , we can see

df -h /grid/sdb
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        1.8T  1.8T  0T   100% /grid/sdb

any suggestion how to avoid this situation that stdout are huge and actually this issue cause stopping the HDFS component on the datanode

second: since the PATH of stdout is:

/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout

is it possible to limit the file size? or do a purging of stdout when file reached the threshold ?

Michael-Bronson
5 REPLIES 5
Highlighted

Re: datanode disks are full because huge files as stdout

Super Mentor

@Michael Bronson

What is the value set for "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" property for your yarn?

-> This is the maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0. If the value is greater than or equal to 100, the nodemanager will check for full disk. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs. The default value is 90.0%. Hence either clean up the disk that the unhealthy node is running on, or increase the threshold in yarn-site.xml

Ambari --> YARN -> Configs -> Advanced yarn-site -> Check "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" .

"yarn.nodemanager.log-dirs": It is always best to make sure dedicated disk is allocated you can check the path of the property "yarn.nodemanager.log-dirs" and move it to dedicated disk where enough space is available. This property Determines where the container-logs are stored on the node when the containers are running.


Also please check the property "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" value. It's default value is 0. The minimum space that must be available on a disk for it to be used. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs.

Highlighted

Re: datanode disks are full because huge files as stdout

@Jay the value is yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage=90

Michael-Bronson
Highlighted

Re: datanode disks are full because huge files as stdout

yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb=1000M in my ambari

Michael-Bronson
Highlighted

Re: datanode disks are full because huge files as stdout

@Jay just to mention that , we want to limit the size of stdout or stderr , is it possible ? , lets say for example if we want to limit the size until 1G per file ,

Michael-Bronson
Highlighted

Re: datanode disks are full because huge files as stdout

@Jay any update?

Michael-Bronson
Don't have an account?
Coming from Hortonworks? Activate your account here