Created 02-13-2016 03:31 AM
I restart yarn failed ,for nodemanager restart failed when meet with Input/output error.
How to handle such error ? fsck device ? remove error path from yarn.nodemanager.local-dirs ?
log show below
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 153, in <module> Nodemanager().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 524, in restart self.start(env, upgrade_type=upgrade_type) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 50, in start self.configure(env) # FOR SECURITY File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 56, in configure yarn(name="nodemanager") File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/yarn.py", line 146, in yarn sudo=True, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'chmod -R 755 /letv/hadoop/yarn/local /var/hadoop/yarn/local /data/slot0/hadoop/yarn/local /data/slot1/hadoop/yarn/local /data/slot2/hadoop/yarn/local /data/slot3/hadoop/yarn/local /data/slot4/hadoop/yarn/local /data/slot5/hadoop/yarn/local /data/slot6/hadoop/yarn/local /data/slot7/hadoop/yarn/local /data/slot8/hadoop/yarn/local /data/slot9/hadoop/yarn/local /data/slota/hadoop/yarn/local /data/slotb/hadoop/yarn/local' returned 1. chmod: cannot access `/data/slot7/hadoop/yarn/local': Input/output error
Created 02-13-2016 03:33 AM
See this
/data/slot7/hadoop/yarn/local': Input/output error
Does this exist ?
ls -l /data/slot7/hadoop/yarn/local
If it does then check permissiobs
Created 02-13-2016 03:53 AM
yes , path exist and permission is ok ,however disk hardware error now !
Created 02-13-2016 01:01 PM
@kang hua This is definitely hardware/disk error . See this thread http://unix.stackexchange.com/questions/39905/input-output-error-when-accessing-a-directory
You have to fix the disk issue. I am not sure if fsck will help.
For now, you may want to remove that disk from you configs. Is it production system?
Created 02-20-2016 02:12 AM
@kang hua following up on this ..."disk hardware error"
Were you able to resolve it?
Created 02-20-2016 06:06 AM
next week I will try to resolve
Created 02-13-2016 11:39 AM
If you have root. Try the command as yarn
Chmod -R 755 /data/slot7...
But if you get disk hardware error now I would assume something is wrong with one of the drives of the node. Can you copy files in that folder? Do an health check? Etc.