Support Questions
Find answers, ask questions, and share your expertise

How to do if meet with "Input/output error" when restart nodemanager?

Explorer

I restart yarn failed ,for nodemanager restart failed when meet with Input/output error.

How to handle such error ? fsck device ? remove error path from yarn.nodemanager.local-dirs ?

log show below

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 153, in <module>
    Nodemanager().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
    method(env)
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 524, in restart
    self.start(env, upgrade_type=upgrade_type)
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 50, in start
    self.configure(env) # FOR SECURITY
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 56, in configure
    yarn(name="nodemanager")
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/yarn.py", line 146, in yarn
    sudo=True,
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 158, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 121, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 238, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
    tries=tries, try_sleep=try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'chmod -R 755 /letv/hadoop/yarn/local /var/hadoop/yarn/local /data/slot0/hadoop/yarn/local /data/slot1/hadoop/yarn/local /data/slot2/hadoop/yarn/local /data/slot3/hadoop/yarn/local /data/slot4/hadoop/yarn/local /data/slot5/hadoop/yarn/local /data/slot6/hadoop/yarn/local /data/slot7/hadoop/yarn/local /data/slot8/hadoop/yarn/local /data/slot9/hadoop/yarn/local /data/slota/hadoop/yarn/local /data/slotb/hadoop/yarn/local' returned 1. chmod: cannot access `/data/slot7/hadoop/yarn/local': Input/output error
6 REPLIES 6

@kang hua

See this

/data/slot7/hadoop/yarn/local': Input/output error

Does this exist ?

ls -l /data/slot7/hadoop/yarn/local

If it does then check permissiobs

Explorer

yes , path exist and permission is ok ,however disk hardware error now !

@kang hua This is definitely hardware/disk error . See this thread http://unix.stackexchange.com/questions/39905/input-output-error-when-accessing-a-directory

You have to fix the disk issue. I am not sure if fsck will help.

For now, you may want to remove that disk from you configs. Is it production system?

@kang hua following up on this ..."disk hardware error"

Were you able to resolve it?

Explorer

next week I will try to resolve

If you have root. Try the command as yarn

Chmod -R 755 /data/slot7...

But if you get disk hardware error now I would assume something is wrong with one of the drives of the node. Can you copy files in that folder? Do an health check? Etc.