Support Questions
Find answers, ask questions, and share your expertise

How to do if meet with "Input/output error" when restart nodemanager?


I restart yarn failed ,for nodemanager restart failed when meet with Input/output error.

How to handle such error ? fsck device ? remove error path from yarn.nodemanager.local-dirs ?

log show below

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/YARN/", line 153, in <module>
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/", line 219, in execute
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/", line 524, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/YARN/", line 50, in start
    self.configure(env) # FOR SECURITY
  File "/var/lib/ambari-agent/cache/common-services/YARN/", line 56, in configure
  File "/usr/lib/python2.6/site-packages/ambari_commons/", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/YARN/", line 146, in yarn
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 154, in __init__
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 158, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 121, in run_action
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/", line 238, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/", line 291, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'chmod -R 755 /letv/hadoop/yarn/local /var/hadoop/yarn/local /data/slot0/hadoop/yarn/local /data/slot1/hadoop/yarn/local /data/slot2/hadoop/yarn/local /data/slot3/hadoop/yarn/local /data/slot4/hadoop/yarn/local /data/slot5/hadoop/yarn/local /data/slot6/hadoop/yarn/local /data/slot7/hadoop/yarn/local /data/slot8/hadoop/yarn/local /data/slot9/hadoop/yarn/local /data/slota/hadoop/yarn/local /data/slotb/hadoop/yarn/local' returned 1. chmod: cannot access `/data/slot7/hadoop/yarn/local': Input/output error

@kang hua

See this

/data/slot7/hadoop/yarn/local': Input/output error

Does this exist ?

ls -l /data/slot7/hadoop/yarn/local

If it does then check permissiobs


yes , path exist and permission is ok ,however disk hardware error now !

@kang hua This is definitely hardware/disk error . See this thread

You have to fix the disk issue. I am not sure if fsck will help.

For now, you may want to remove that disk from you configs. Is it production system?

@kang hua following up on this ..."disk hardware error"

Were you able to resolve it?


next week I will try to resolve

If you have root. Try the command as yarn

Chmod -R 755 /data/slot7...

But if you get disk hardware error now I would assume something is wrong with one of the drives of the node. Can you copy files in that folder? Do an health check? Etc.