Support Questions

Find answers, ask questions, and share your expertise

Yarn: One nodemanager refuse to start

avatar
Contributor

Hello, I am currently encountering a problem with one nodemanager.

 

I used a snapshot rollback on the cluster, and since, only this nodemanager (1 of 3) is having trouble.

 

http://pastebin.com/wsppupBf

 

we can see:

chmod: changing permissions of `/var/run/cloudera-scm-agent/process/608-yarn-NODEMANAGER/container-executor.cfg': Operation not permitted
chmod: changing permissions of `/var/run/cloudera-scm-agent/process/608-yarn-NODEMANAGER/topology.map': Operation not permitted

So I tried to give the proper ownership to these files (yarn:hadoop).

But then cloudera manager recreate the configuration under 

var/run/cloudera-scm-agent/process/609-yarn-NODEMANAGER/

So I had like ten folders for yarn under the cloudera agent process... I removed those but it continued to increment the number of folder as I tried to start it again.

So Cloudera manager seems to create these 2 files with root:root ownership everytime, which is weird since It shouldn't be able to do it.

 

I clearly don't understand what's going on here.

Any hint to help me resolve it ?

 

Thanks 🙂

--
Lefevre Kevin
1 ACCEPTED SOLUTION

avatar

Hi,

 

Those errors won't cause any actual problems with your runtime. They will appear on perfectly functional NodeManagers. We need to find the real error.

 

Is there some kind of fatal error at the end of stderr log? In the NodeManager role logs?

 

Thanks,

Darren

View solution in original post

10 REPLIES 10

avatar

Hi,

 

Those errors won't cause any actual problems with your runtime. They will appear on perfectly functional NodeManagers. We need to find the real error.

 

Is there some kind of fatal error at the end of stderr log? In the NodeManager role logs?

 

Thanks,

Darren

avatar
Expert Contributor

What is the solution? We have the same issue with starting YARN 

 

avatar
Explorer

Supervisor returned FATAL. Please check the role log file, stderr, or stdout.

 

I have the same issue, when i try to start nodemanger it complains about operation not permitted

 

chmod: changing permissions of `/var/run/cloudera-scm-agent/process/3669-yarn-NODEMANAGER/container-executor.cfg': Operation not permitted
chmod: changing permissions of `/var/run/cloudera-scm-agent/process/3669-yarn-NODEMANAGER/topology.map': Operation not permitted
+ exec /usr/lib/hadoop-yarn/bin/yarn nodemanager

 

 

avatar
Expert Contributor

Have you changed somethin in directory or file permissions in /var/run?

If yes, you should probably reconfigure YARN to use a NEW directory (for example if YARN used /data/yarn/nm for NodeManager, configure a new path as /data/yarn/nm2) After setting changing EVERY directory for YARN and restarting the Cluster the YARN started, created the new directories and set the permissions correctly, so now we dont have this kind of problem with permissions.

 

 

If you didnt change any permission in the local file system, then I dont know what is the issue. Try another user - such as run for example a hive job under root/hdfs/yarn or other user, to see whether this is user related or it fails always.

 

T.

 

avatar
Explorer

chmod: changing permissions of `/var/run/cloudera-scm-agent/process/3669-yarn-NODEMANAGER/container-executor.cfg': Operation not permitted
chmod: changing permissions of `/var/run/cloudera-scm-agent/process/3669-yarn-NODEMANAGER/topology.map': Operation not permitted
+ exec /usr/lib/hadoop-yarn/bin/yarn nodemanager

avatar

Whats the solution to this?

avatar
New Contributor
how to resolve this problem

avatar
Rising Star
Do you have YARN HA turned on? If so, could you add this to your NodeManagers safety valve, and then restart the NMs?

<property>
<name>yarn.nodemanager.recovery.dir</name>
<value>/var/lib/yarn-nm-recovery</value>
</property>

(Please create that /var/lib/yarn-nm-recovery directory, and change the owner to the `yarn' user.)

And if you're not running YARN HA, then I'm at a lost. Could you paste your NM log, from /var/log/hadoop-yarn/...?

avatar
Rising Star

I just learned that this has nothing to do with YARN HA. So you're likely to be running into NM recovery issue.

 

If you upgrade to Cloudera Manager 5.2.1 (or later), it'll automatically defaults the recovery dir to a non-tmp location. So you'll be good. If you can't upgrade, you can manually set that config in the previous post.