In my production hadoop cluster is having 2 head nodes. yesterday failover has happened from head node 1 to head node 0.
Now Head node 0 is in active mode and head node 1 is in stand by mode.after this fail-over all the services were up and running in head node 0(Active) but in head node 1 all the services were in stopped state for the long time(Approximately 1 hour).
Then i have manually started all the services through Ambari. now all the services are up and running in both the head nodes.
before starting the services i have checked the below things:
1) i used ping command, to know whether the head node 1 able to communicate with other nodes in the cluster ---- Result : Success.
2)i have checked the Ambari-agent status, to know whether the agent is running in head node-1 or not -------Result : Success.
3) i have checked the hive connectivity through beeline, using ODBC connection string. result--- i was able to connect.
As i m a fresher and i m new to this hadoop world, i didn't have much knowledge in Trouble shooting. i m learning day by day.
i would like to know the reason behind this issue.
Appreciate if anybody can provide the correct explanation about this issue.
Thanks in Advance!
Below Services has been started manually by me:
1) Standby Name Node.
2) Standby Resource Manager.
3) Zookeeper Failover Controller.
8)Amabri metrics monitor
@ss00552277 - Did you check the logs for those services and found any error messages? Example for hive check the "/var/log/hive/" log directory.
Also I will suggest you to check the same on your host also. As all your services was down on one host, there might be possibility of some issues from the OS side also. Just check the /var/log/message and dmesg on the host.