11-07-2013 08:25 AM - edited 11-07-2013 08:46 AM
Is it possible to install ZooKeeper via CM after the initial installation? I am trying to start the "failoverController" to enable auto failover, but CM keeps sqawking about ZooKeeper not being available even though I manuall installed ZK via the yum repository on the 3 nodes.
It is running on the 3 nodes as leader,follower,follower, but for some reason CM doesn't recognize it's installed. I thought maybe it has to be "registered" with CM in some way. (Just FYI, I tried 3 times to install ZK during the initial CM install but it would never start correctly even though there were zero errors during the inspection)
# /usr/lib/zookeeper/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/lib/zookeeper/bin/../conf/zoo.cfg
11-07-2013 11:08 AM
Did you try adding a ZooKeeper service to your cluster? From the home page, click the arrow on the same line as your cluster name, then click on Add Service.
If you want other services to use that ZooKeeper, don't forget to update their configuration as well.
11-07-2013 03:17 PM
Thanks, I had already started over by the time I got your repsonse, but I did not see that option earlier.... so good to know it's there.
By the way, toward the end of a fresh install before CM tries to start all of the services, I have to go chown all of the directories on all nodes to the correct users:groups because they are all owned by root. This causes the services to fail when starting.
For example, the namenode will fail to format because the dfs directory is not owned by hdfs user, and ZK will fail to start because dir is not owned by zookeeper. Perhaps a bug?
11-08-2013 08:30 AM
I'm doing a fresh install of CDH via CM on 5 fresh minimal CentOS 6.4 VM's. My dirs are set up like:
At the end of the CM install after setting all of the directory paths for each role/service, the directories are still owned by root. If I click "Continue", CM will attempt to start the services and the NN will fail to format and ZooKeeper will fail to start. So before I click Continue, I have to go to the CLI and:
chown -R hdfs:hdfs /data/01/dfs
chown -R hdfs:hdfs /data/01/local
chown -R mapred:mapred /data/01mapred
chown -R zookeeper:zookeeper /data/01/zookeeper
11-08-2013 10:19 AM
I see what's happening. It sounds like the dirs were created before the configuration was passed to CM. If you are going to be manually creating the dirs, then CM assumes you'll set them up with the right perms and won't chown them. This is to help make sure we don't accidentally break something if you made a mistake.
If you want to have CM set this all up for you, then you could create /data/01/, but let the leaf directories be created by CM.
11-08-2013 01:27 PM
Okay I see. That makes since, but it would be nice to add that to the documentation :-)
I think I have it all working nicely now and in good health except for the failoverControllers won't start for some reason. I have HA and auto-failover enable, but the 3 failOverControllers are stopped. Any idea how to address these errors when trying to start them?
Unable to start failover controller. Parent znode does not exist.
And this is from the Stderr log
Exception in thread "main" org.apache.hadoop.HadoopIllegalArgumentException: Configuration has no addresses that match local node's address. Please configure the system with mapred.ha.jobtracker.id at org.apache.hadoop.mapred.HAUtil.getSuffixIDs(HAUtil.java:241) at org.apache.hadoop.mapred.HAUtil.getJobTrackerId(HAUtil.java:158) at org.apache.hadoop.mapred.tools.MRZKFailoverController.create(MRZKFailoverController.java:123) at org.apache.hadoop.mapred.tools.MRZKFailoverController.main(MRZKFailoverController.java:172)
11-08-2013 02:26 PM
Did you manually add the failover controllers, or did you use the wizard to enable automatic failover?
The wizard should have done the zookeeper initialization for you.
11-08-2013 02:55 PM
I installed ZK via CM during the initial install and enabled HA via the wizard. It's odd because the FailoverControllers for the HDFS service shows as started, but the FailoverControllers assigned to the MapReduce service shows stopped/down (hence the error)
11-08-2013 03:10 PM
Sorry, I missed that you were talking about the MR failover controllers. Did you go through the Job Tracker High Availability Wizard? This should have initialized the mapreduce zk failover controllers (which have separate ZooKeeper data from the HDFS ones).