we saw few scenarios that disks on datanode machine became full 100%
because the files - stdout are huge
from df -h , we can see
df -h /grid/sdb Filesystem Size Used Avail Use% Mounted on /dev/sdb 1.8T 1.8T 0T 100% /grid/sdb
any suggestion how to avoid this situation that stdout are huge and actually this issue cause stopping the HDFS component on the datanode
second: since the PATH of stdout is:
/var/log/hadoop-yarn/containers/[application id]/[container id]/stdout
is it possible to limit the file size? or do a purging of stdout when file reached the threshold ?
What is the value set for "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" property for your yarn?
-> This is the maximum percentage of disk space utilization allowed after which a disk is marked as bad. Values can range from 0.0 to 100.0. If the value is greater than or equal to 100, the nodemanager will check for full disk. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs. The default value is 90.0%. Hence either clean up the disk that the unhealthy node is running on, or increase the threshold in yarn-site.xml
Ambari --> YARN -> Configs -> Advanced yarn-site -> Check "yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage" .
"yarn.nodemanager.log-dirs": It is always best to make sure dedicated disk is allocated you can check the path of the property "yarn.nodemanager.log-dirs" and move it to dedicated disk where enough space is available. This property Determines where the container-logs are stored on the node when the containers are running.
Also please check the property "yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb" value. It's default value is 0. The minimum space that must be available on a disk for it to be used. This applies to yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs.
@Jay the value is yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage=90
@Jay just to mention that , we want to limit the size of stdout or stderr , is it possible ? , lets say for example if we want to limit the size until 1G per file ,