Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

YARN Spark job filling up /dev/mapper/hostname-root Centos 7 while running

avatar
New Contributor

We are testing our first cluster and appear to have failing Spark jobs due to lack of disk space while the job is running.

 

Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/centos_usnj1cldrmst1-root   50G   49G  1.5G  98% /
devtmpfs                                24G     0   24G   0% /dev
tmpfs                                   24G     0   24G   0% /dev/shm
tmpfs                                   24G  9.6M   24G   1% /run
tmpfs                                   24G     0   24G   0% /sys/fs/cgroup
/dev/mapper/centos_usnj1cldrmst1-home  198G   33M  197G   1% /home
/dev/sda1                             1014M  232M  783M  23% /boot
cm_processes                            24G   21M   24G   1% /run/cloudera-scm-agent/process
tmpfs                                  4.8G   12K  4.8G   1% /run/user/42
tmpfs                                  4.8G     0  4.8G   0% /run/user/1000
tmpfs                                  4.8G     0  4.8G   0% /run/user/0

 

This is is what it looks like before the job starts:

Filesystem                             Size  Used Avail Use% Mounted on
/dev/mapper/centos_usnj1cldrmst1-root   50G   36G   15G  71% /
devtmpfs                                24G     0   24G   0% /dev
tmpfs                                   24G     0   24G   0% /dev/shm
tmpfs                                   24G  9.6M   24G   1% /run
tmpfs                                   24G     0   24G   0% /sys/fs/cgroup
/dev/mapper/centos_usnj1cldrmst1-home  198G   33M  197G   1% /home
/dev/sda1                             1014M  232M  783M  23% /boot
cm_processes                            24G   21M   24G   1% /run/cloudera-scm-agent/process
tmpfs                                  4.8G   12K  4.8G   1% /run/user/42
tmpfs                                  4.8G     0  4.8G   0% /run/user/1000
tmpfs                                  4.8G     0  4.8G   0% /run/user/0

 

[root@usnj1cldrmst1 ~]#  df -i /tmp
Filesystem                              Inodes  IUsed    IFree IUse% Mounted on
/dev/mapper/centos_usnj1cldrmst1-root 26214400 206658 26007742    1% /

 

This is running on a VM so we can easily expand the (single) disk but /dev/mapper/centos_usnj1cldrmst1-root is in the middle of the disk so expanding it may not help.  It may be easier to add a new volume and point the temporary work directory target (wherever that is?) to a new location.   I'm not sure we we are just configured incorrectly but am looking for suggestions. Thank you

1 ACCEPTED SOLUTION

avatar
Mentor
Spark running on YARN will use the temporary storage presented to it by the NodeManagers where the containers run.

These directory path lists are configured via Cloudera Manager -> YARN -> Configuration -> "NodeManager Local Directories" and "NodeManager Log Directories".

You can replace its values to point to your new, larger volume, and it will cease to use your root partition.

FWIW, the same applies for HDFS if you use it.

Also see: https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.h...

View solution in original post

1 REPLY 1

avatar
Mentor
Spark running on YARN will use the temporary storage presented to it by the NodeManagers where the containers run.

These directory path lists are configured via Cloudera Manager -> YARN -> Configuration -> "NodeManager Local Directories" and "NodeManager Log Directories".

You can replace its values to point to your new, larger volume, and it will cease to use your root partition.

FWIW, the same applies for HDFS if you use it.

Also see: https://www.cloudera.com/documentation/enterprise/release-notes/topics/hardware_requirements_guide.h...