Support Questions

Find answers, ask questions, and share your expertise

NodeManagers go down after a few minutes in HDPCA AWS Instance for no reason

New Contributor

Hi guys,

I'm playing around with the AWS instance HDPCA 2.3 and I have some issues when adding the node1.

I just installed the clients und for no explanable reason Ambari alerts the 3 NodeManagers down.

When I restart them, they are reported "running" for a few minutes and become red again.

yarn node -list sais, all tree are running.

Same for the ResourceManager Web UI.

The alert is about the nodemanager web service on port 8042.

After trying this with a new instance and having the same problem, I started my very own HDP installation on 6 vanilla CentOS instances. At some point, I had the same issues.

I don't have any idea, what might be the reason and where I can have a look for deeper analysis.

Any help would be much appreciated.

Thanks and bye,



Hey @Chris K!
Could you check the logs, and share with us any error/warn/fatal msg?


Just in case, check if is there any PID running on 8042.

netstat -tulpn | grep 8042

Hope this helps!

New Contributor

Hi @Vinicius Higa Murakami,

when starting the nodemanager via ambari, I get a process listening on 8042 for one second:

[root@resourcemanager ~]# while [ true ]; do sleep 1; netstat -tupln | grep 8042; done
tcp        0      0 :::8042                     :::*                        LISTEN      20422/java

Also, please find attached two grep-results of the log-file while restarting via ambari.

tail -f /var/log/hadoop-yarn/yarn/yarn-yarn-nodemanager-resourcemanager.log | grep -i 'error\|warn\|fatal\|severe' > /tmp/nodemanager1.log

Thank you so much for your support!


New Contributor

Maybe that helps:



No process listening on port 8042 though...

Hi @Chris K!
Guess you're missing a sticky bit on your /app-logs
Could you try to run the following commands?

[hdfs@node2 ~]$ hdfs dfs -ls -d /app-logs
[hdfs@node2 ~]$ hdfs dfs -ls -h /app-logs
#Adding the sticky bit
hdfs dfs -chmod +t /app-logs

Hope this helps!

@Vinicius Higa Murakami i also got the same issue and added that sticky bit. it worked for me for some days but again node manager is getting down.

Hi @Punit kumar!
Could you share with us your logs?
BTW, I'd kindly ask you to open a new question so we can work in separate threads, and also will be easier to other HCC users to find the best answer 🙂

@Vinicius Higa Murakami after fixing the sticky bit error, again i was unable to start node manager and in log there was no error msg after that again i tried to start it and at that time container is getting failed, log of that. and i have other development hadoop clusters in the aws which was working previously but now in every cluster node manager is getting down.

New Contributor

Hi @Vinicius Higa Murakami,

I was abcent a view days due to some business trip.

I just started my AWS instance to check your sticky bit tip but the HDP started without any errors...?

I didn't do anything with it in the past days so I have no idea what's going on.

Could it be that these errors occur because of network issues during "rush hours" in AWS?

I'll have an eye on it at the weekend...



Gotcha @Chris K! Well good to know that's working now 🙂
And about the issue, are you using spot instances? Yeah it's kinda strange to happen this suddenly, let's us know if this mystery shows up again. Then we can take a look at your timeout configs and look for race conditions issues as well.

The following JIRA explains something quite similar to your case.
Hope this helps!

New Contributor

Hi @Vinicius Higa Murakami,

unfortunately it wasn't that easy... the next time, I started my environment, I had these strange errors again.

But - after quite some desperate hours of trial and error - I got the point.

Whenever I started a brand new image, everything was fine. I didn't have any errors.

When I started the shutted down image the next day, it was ruined.

It occurs to me, that a solid termination of the HDP processes in ambari and a service ambari-agent stop with a service ambari-server stop would be a nicer approach and that helped.

When terminating the process in a correct manner, the restart will happen without any errors.

When just shutting down the AWS instance, it breaks.

That's it, plain and simple.

Well... the good point is: I learned a lot 😉

Thanks for your help!


Good one! Gotcha 🙂

I didn't know about this either.
Keep it up with your studies on HDPCA 😄