06-14-2017 04:04 AM - edited 06-14-2017 04:06 AM
I am using Cloudera Express 5.5.1 to manage a 8 node cluster.
I was trying to run a new hadoop job when it failed due to some space issues. I was getting the following error:
"No space available in any of the local directories."
After that, I decided to restart the YARN/Mapreduce service to check if the problem is gone. However, I am able to restart all the services except the JobHistory server, which is showing a time out exception. I cannot find any other clue about the issue as nothig is shown in any log.
Any clue? What should I do?
Many thanks in advance.
06-14-2017 05:12 AM - edited 06-14-2017 05:13 AM
Well from the error log you have shared it seems your disks storage are full (where Yarn works localy).
Make some room before restarting the services.
06-14-2017 05:19 AM
Thank you for your response.
The JobHistory service is running on the Headnode, which seems to have plenty of space available according to df -Th output:
df -Th Filesystem Type Size Used Avail Use% Mounted on /dev/sda1 ext4 164G 66G 90G 43% / none tmpfs 4.0K 0 4.0K 0% /sys/fs/cgroup udev devtmpfs 6.9G 4.0K 6.9G 1% /dev tmpfs tmpfs 1.4G 776K 1.4G 1% /run none tmpfs 5.0M 0 5.0M 0% /run/lock none tmpfs 6.9G 0 6.9G 0% /run/shm none tmpfs 100M 0 100M 0% /run/user cm_processes tmpfs 6.9G 16M 6.9G 1% /run/cloudera-scm-agent/process
Also, I am having the same issue with other services (OOzie and YARN/ResourceManager)
06-14-2017 05:36 AM - edited 06-14-2017 05:37 AM
The error you have shared about space issue is related to the worker node.
Check that before investigating the job history server issue.
Then only investigate why the job history server failed to restart.
Get to the role log file and see what is the real error.
If you have the same issue with roles on the same host then you might want to restart the Cloudera agent of that node. I remember experiencing timeout on commands when the cloudera agent was not working properly.
06-15-2017 02:12 PM