Support Questions
Find answers, ask questions, and share your expertise

Secondary NameNode Host has 4 service problems after HA NameNode Rollback [Errno 111]

Hi there,

So all of my hosts and services were working. I then tried to Enable NameNode HA, but the wizard failed and so I rolled back using this Here.

I'm guessing troubleshooting these errors individually will not get me far, as I'm sure it is not a coincidence that they all happened after this rollback, and only on the secondary NameNode. The other services run and aren't crashing on this host (storm, falcon, yarn, flume, hive, etc.)

Is there a way to troubleshoot the SNameNode a bit? I'm very new to hadoop and ambari.

Service problems are as follows, and these issues reoccur exactly the same after clean-up, agent restarts, server restarts, etc.:

Oozie Server [Errno 111] Connection refused startup succeeds

DataNode [Errno 111] Connection refused was up for about 10 minutes. startup fails now

RegionServer [Errno 111] Connection refused. startup succeeds, crashed later.

Accumulo TServer [Errno 111] Connection refused startup succeeds, but then it fails a few seconds later

1 ACCEPTED SOLUTION

Mentor

go to /var/log/hadoop directory and navigate to secondary namenode log directory, start reviewing the logs there. If you have a custom location for your logs, you can find that in the Ambari configs section of the HDFS service. Once you find the log, feel free to post your errors.

View solution in original post

6 REPLIES 6

Mentor

go to /var/log/hadoop directory and navigate to secondary namenode log directory, start reviewing the logs there. If you have a custom location for your logs, you can find that in the Ambari configs section of the HDFS service. Once you find the log, feel free to post your errors.

Mentor

I did look at your logs but I rejected them so they wouldn't publish, you have your site's server names and IPs. It's hard to tell what it is. If you have support agreement with HWX, I'd recommend placing a support issue.

Oops, thank you! I can repost them with x's instead if you want. I've discovered that part of the issue was actually a disk space issue. Thinks are much cleaner now that that's been solved, but my secondary namenode process still says it can't connect.

Mentor

Perhaps you want to attempt to setup HA again as you've found the root cause?

It caused such a disaster the last time I don't want to mess up anything more than it already is... but maybe that's the best option at this point.

Thank you for your advice and help @Artem Ervits, I really appreciate it.

Contributor

@Savanna Endicott: I have done the HA rollback, please send the link of that docs, I am also rollback HA but got some error in the following command

" curl --negotiate -u root:hashmap "X-Requested-By: ambari" -i -X POST -d '{"host_components" : [{"HostRoles":{"component_name":"navideh02.hash.net"}] }' http://localhost:8080/api/v1/clusters/NHDP/hosts?Hosts/host_name=navideh02.hash.net"

please correct the above command

; ;