Support Questions

Find answers, ask questions, and share your expertise

Yarn: One nodemanager refuse to start

avatar
Contributor

Hello, I am currently encountering a problem with one nodemanager.

 

I used a snapshot rollback on the cluster, and since, only this nodemanager (1 of 3) is having trouble.

 

http://pastebin.com/wsppupBf

 

we can see:

chmod: changing permissions of `/var/run/cloudera-scm-agent/process/608-yarn-NODEMANAGER/container-executor.cfg': Operation not permitted
chmod: changing permissions of `/var/run/cloudera-scm-agent/process/608-yarn-NODEMANAGER/topology.map': Operation not permitted

So I tried to give the proper ownership to these files (yarn:hadoop).

But then cloudera manager recreate the configuration under 

var/run/cloudera-scm-agent/process/609-yarn-NODEMANAGER/

So I had like ten folders for yarn under the cloudera agent process... I removed those but it continued to increment the number of folder as I tried to start it again.

So Cloudera manager seems to create these 2 files with root:root ownership everytime, which is weird since It shouldn't be able to do it.

 

I clearly don't understand what's going on here.

Any hint to help me resolve it ?

 

Thanks 🙂

--
Lefevre Kevin
1 ACCEPTED SOLUTION

avatar

Hi,

 

Those errors won't cause any actual problems with your runtime. They will appear on perfectly functional NodeManagers. We need to find the real error.

 

Is there some kind of fatal error at the end of stderr log? In the NodeManager role logs?

 

Thanks,

Darren

View solution in original post

10 REPLIES 10

avatar
New Contributor

I ran into this error, and it was caused by running out of heap size for Nodemanager.  I increased the heap, and Yarn came up without errors.