Support Questions

Find answers, ask questions, and share your expertise

How to configure YARN to limit YARN applications' storage consumption

avatar
Contributor

Hello Team,

 

We had a situation where one application consumed over 1 TB of disk space which eventually flooded the disk space. We had to kill this application for freeing the space on this disk. Due to this not happening in the future, we want to limit the storage consumption of the YARN application. Could you please share how to configure this?

 

Best Regards

3 REPLIES 3

avatar
Rising Star

Does it happen to be a MapReduce job? After killing the job; did you clean up the disk space manually?

 

You can implement quotas to restrict disk space usage
https://www.linux.com/news/implementing-quotas-restrict-disk-space-usage/

avatar
Expert Contributor

Hi, this is most likely due to some long running jobs like Spark Streaming, which will continuously generate logs while they are running.

 

We need to modify the log level on the application side. Still, taking Spark Streaming as example, we can add rolling appender in the log4j.properties for the application, so the job will rotate the logs with limited size you set in the log4j file.

For details steps please refer: 
https://my.cloudera.com/knowledge/Long-running-Spark-streaming-applications-s-YARN-container?id=9061...
https://my.cloudera.com/knowledge/Video-KB-How-to-configure-log4j-for-Spark-on-YARN-cluster?id=27120...

 

Regarding other types of jobs it's similar, we need to let application team tune the log level so they will not generate indefinite amount of logs.

avatar
Contributor

Hello @ywu Thank you for the links. This one helps. So, if I understand correctly, there is not much we can do to control the size of the logs from the YARN but from the application itself since the application log files will continue to grow until the disk gets filled and the NodeManager goes into the decommissioned state, is it right?