Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Yarn: One nodemanager refuse to start

Solved Go to solution
Highlighted

Yarn: One nodemanager refuse to start

Explorer

Hello, I am currently encountering a problem with one nodemanager.

 

I used a snapshot rollback on the cluster, and since, only this nodemanager (1 of 3) is having trouble.

 

http://pastebin.com/wsppupBf

 

we can see:

chmod: changing permissions of `/var/run/cloudera-scm-agent/process/608-yarn-NODEMANAGER/container-executor.cfg': Operation not permitted
chmod: changing permissions of `/var/run/cloudera-scm-agent/process/608-yarn-NODEMANAGER/topology.map': Operation not permitted

So I tried to give the proper ownership to these files (yarn:hadoop).

But then cloudera manager recreate the configuration under 

var/run/cloudera-scm-agent/process/609-yarn-NODEMANAGER/

So I had like ten folders for yarn under the cloudera agent process... I removed those but it continued to increment the number of folder as I tried to start it again.

So Cloudera manager seems to create these 2 files with root:root ownership everytime, which is weird since It shouldn't be able to do it.

 

I clearly don't understand what's going on here.

Any hint to help me resolve it ?

 

Thanks :)

--
Lefevre Kevin
1 ACCEPTED SOLUTION

Accepted Solutions

Re: Yarn: One nodemanager refuse to start

Hi,

 

Those errors won't cause any actual problems with your runtime. They will appear on perfectly functional NodeManagers. We need to find the real error.

 

Is there some kind of fatal error at the end of stderr log? In the NodeManager role logs?

 

Thanks,

Darren

10 REPLIES 10

Re: Yarn: One nodemanager refuse to start

Hi,

 

Those errors won't cause any actual problems with your runtime. They will appear on perfectly functional NodeManagers. We need to find the real error.

 

Is there some kind of fatal error at the end of stderr log? In the NodeManager role logs?

 

Thanks,

Darren

Re: Yarn: One nodemanager refuse to start

Rising Star

What is the solution? We have the same issue with starting YARN 

 

Re: Yarn: One nodemanager refuse to start

New Contributor

Supervisor returned FATAL. Please check the role log file, stderr, or stdout.

 

I have the same issue, when i try to start nodemanger it complains about operation not permitted

 

chmod: changing permissions of `/var/run/cloudera-scm-agent/process/3669-yarn-NODEMANAGER/container-executor.cfg': Operation not permitted
chmod: changing permissions of `/var/run/cloudera-scm-agent/process/3669-yarn-NODEMANAGER/topology.map': Operation not permitted
+ exec /usr/lib/hadoop-yarn/bin/yarn nodemanager

 

 

Re: Yarn: One nodemanager refuse to start

Rising Star

Have you changed somethin in directory or file permissions in /var/run?

If yes, you should probably reconfigure YARN to use a NEW directory (for example if YARN used /data/yarn/nm for NodeManager, configure a new path as /data/yarn/nm2) After setting changing EVERY directory for YARN and restarting the Cluster the YARN started, created the new directories and set the permissions correctly, so now we dont have this kind of problem with permissions.

 

 

If you didnt change any permission in the local file system, then I dont know what is the issue. Try another user - such as run for example a hive job under root/hdfs/yarn or other user, to see whether this is user related or it fails always.

 

T.

 

Re: Yarn: One nodemanager refuse to start

New Contributor

chmod: changing permissions of `/var/run/cloudera-scm-agent/process/3669-yarn-NODEMANAGER/container-executor.cfg': Operation not permitted
chmod: changing permissions of `/var/run/cloudera-scm-agent/process/3669-yarn-NODEMANAGER/topology.map': Operation not permitted
+ exec /usr/lib/hadoop-yarn/bin/yarn nodemanager

Re: Yarn: One nodemanager refuse to start

New Contributor

Whats the solution to this?

Re: Yarn: One nodemanager refuse to start

New Contributor
how to resolve this problem

Re: Yarn: One nodemanager refuse to start

Contributor
Do you have YARN HA turned on? If so, could you add this to your NodeManagers safety valve, and then restart the NMs?

<property>
<name>yarn.nodemanager.recovery.dir</name>
<value>/var/lib/yarn-nm-recovery</value>
</property>

(Please create that /var/lib/yarn-nm-recovery directory, and change the owner to the `yarn' user.)

And if you're not running YARN HA, then I'm at a lost. Could you paste your NM log, from /var/log/hadoop-yarn/...?

Re: Yarn: One nodemanager refuse to start

Contributor

I just learned that this has nothing to do with YARN HA. So you're likely to be running into NM recovery issue.

 

If you upgrade to Cloudera Manager 5.2.1 (or later), it'll automatically defaults the recovery dir to a non-tmp location. So you'll be good. If you can't upgrade, you can manually set that config in the previous post.

Don't have an account?
Coming from Hortonworks? Activate your account here