Actually we use a dedicated large disk on each datanode/nodemanger to host log and local files of running containers.
I read it is recommended to put YARN local and log files on multiple mount points and more precisely on all HDFS disks (to prevent I/O bottlenecked, and impact the whole nodemanger in case of disk failure)
I wonder if it's not dangerous for the HDFS in case the application log fill multiple HDFS mountpoints, what is the expected behavior of the HDFS service?
Thanks for your opinion and/or feedback,
Yes, it is definitely recommended to put the YARN local & log directories on multiple disks for resiliency. Putting them all on a single disk means that when that disk fails, the corresponding node entirely becomes unusable for scheduling any more containers.
While you are in general right about potential impact of container local/log data with HDFS reads/writes, it tends to be minimal in practice because the container local/log data is very tiny compared to HDFS data being read/written.