Support Questions

Find answers, ask questions, and share your expertise
Welcome to the upgraded Community! Read this blog to see What’s New!

datanode disks are full because huge files as stdout


hi all

we saw few scenarios that disks on datanode machine became full 100%

because the files - stdout are huge

for example


from df -h , we can see

df -h /grid/sdb
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb        1.8T  1.8T  0T   100% /grid/sdb

any suggestion how to avoid this situation that stdout are huge and actually this issue cause stopping the HDFS component on the datanode

second: since the PATH of stdout is:

/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout

is it possible to limit the file size? or do a purging of stdout when file reached the threshold ?


Super Mentor

@Michael Bronson

What is the value set for "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" property for your yarn?

-> This is the maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0. If the value is greater than or equal to 100, the nodemanager will check for full disk. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs. The default value is 90.0%. Hence either clean up the disk that the unhealthy node is running on, or increase the threshold in yarn-site.xml

Ambari --> YARN -> Configs -> Advanced yarn-site -> Check "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" .

"yarn.nodemanager.log-dirs": It is always best to make sure dedicated disk is allocated you can check the path of the property "yarn.nodemanager.log-dirs" and move it to dedicated disk where enough space is available. This property Determines where the container-logs are stored on the node when the containers are running.

Also please check the property "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" value. It's default value is 0. The minimum space that must be available on a disk for it to be used. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs.


@Jay the value is yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage=90



yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb=1000M in my ambari



@Jay just to mention that , we want to limit the size of stdout or stderr , is it possible ? , lets say for example if we want to limit the size until 1G per file ,



@Jay any update?