Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Yarn NodeManager fails to start

Yarn NodeManager fails to start

Hello,

We just enabled FreeIPA integration in our Hortonworks cluster (HDP 2.5.3).
We understand that part of the kerberization of Yarn, Ambari/yarn will try and delete the yarn.nodemanager.local-dirs and yarn.nodemanager.log-dirs. The directories defined for these configs in our environment are actual mount points, so as expected Yarn throws this error message "OSError: [Errno 16] Device or resource busy: '/hadoop/yarn/local/01".

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 161, in <module>
    Nodemanager().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 280, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 51, in start
    self.configure(env) # FOR SECURITY
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/nodemanager.py", line 57, in configure
    yarn(name="nodemanager")
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/yarn.py", line 168, in yarn
    action='delete'
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 114, in __new__
    cls(names_list.pop(0), env, provider, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 208, in action_delete
    sudo.rmtree(path)
  File "/usr/lib/python2.6/site-packages/resource_management/core/sudo.py", line 102, in rmtree
    shutil.rmtree(path)
  File "/usr/lib64/python2.7/shutil.py", line 256, in rmtree
    onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib64/python2.7/shutil.py", line 254, in rmtree
    os.rmdir(path)
OSError: [Errno 16] Device or resource busy: '/hadoop/yarn/local/01'
2017-10-25 18:19:36,019 - checked_call returned (0, '')
2017-10-25 18:19:36,019 - Ensuring that hadoop has the correct symlink structure
2017-10-25 18:19:36,019 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf
2017-10-25 18:19:36,025 - Directory['/hadoop/yarn/local/01'] {'action': ['delete']}
2017-10-25 18:19:36,026 - Removing directory Directory['/hadoop/yarn/local/01'] and all its content

Command failed after 1 tries

Any idea how do we get around this? Is there some script/flag we can modify for it to avoid attempting to delete these directories?

Just an FYI, we can start the NM service on our DN manually.

Any help would be appreciated.

Thanks!

K

3 REPLIES 3
Highlighted

Re: Yarn NodeManager fails to start

Super Mentor

@KMan

Yes we will see the following kind of Warning message while enabling the kerberos from Ambari UI
"YARN log and local dir will be deleted and ResourceManager state will be formatted as part of Enabling/Disabling Kerberos."

This is implemented as part of "https://issues.apache.org/jira/browse/AMBARI-13012" : If YARN is the installed service then as part of the first step of kerberos wizard and also while disabling Kerberos, User should be informed that YARN log and local dir will be deleted and RM state will be formatted as part of Enabling/Disabling Kerberos. This helps user to take a backup of these dirs if desired at the beginning of the wizard.


This will happen If the "local-dirs" are mounted filesystems then it will always show tis warning.

So you should "unmount" the dir, then "Enabled kerberos" and then "remounted" it back.

.

Highlighted

Re: Yarn NodeManager fails to start

Yup. We got it resolved earlier this evening and that's what we did. Is this fact mentioned somewhere in the yarn installation/configuration portion? It's not like people setup kerberos right when they setup all the components before they are Kerberized, right?

Anyhew, thanks for the response.

Highlighted

Re: Yarn NodeManager fails to start

New Contributor

I had Faced similar issue.

We tried to change the Node Manager local directory to different mount point and restarted the services it went up and running. After this revert the changes to old directory.

Don't have an account?
Coming from Hortonworks? Activate your account here