In my production hadoop cluster is having 2 head nodes. yesterday failover has happened from head node 1 to head node 0.
Now Head node 0 is in active mode and head node 1 is in stand by mode.after this fail-over all the services were up and running in head node 0(Active) but in head node 1 all the services were in stopped state for the long time(Approximately 1 hour).
Then i have manually started all the services through Ambari. now all the services are up and running in both the head nodes.
before starting the services i have checked the below things:
1) i used ping command, to know whether the head node 1 able to communicate with other nodes in the cluster ---- Result : Success.
2)i have checked the Ambari-agent status, to know whether the agent is running in head node-1 or not -------Result : Success.
3) i have checked the hive connectivity through beeline, using ODBC connection string. result--- i was able to connect.
As i m a fresher and i m new to this hadoop world, i didn't have much knowledge in Trouble shooting. i m learning day by day.
i would like to know the reason behind this issue.
Appreciate if anybody can provide the correct explanation about this issue.
Thanks in Advance!
Below Services has been started manually by me:
1) Standby Name Node.
2) Standby Resource Manager.
3) Zookeeper Failover Controller.
4)Hive Server2
5)Hive metastore.
6)Oozie.
7)WebHcat Server.
8)Amabri metrics monitor