Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDFS High availability

Highlighted

HDFS High availability

I have setup HDFS in HA environment, I have question that is zookeeper required to install on all nodes(namenode-datanode) ?

9 REPLIES 9
Highlighted

Re: HDFS High availability

Expert Contributor

No, Zookeeper does not need to be installed on all nodes when you use HA on HDFS

Highlighted

Re: HDFS High availability

@Edgar Daeds, I am using ambari and It shows me 1 zookeeper client installed.I think client need to be installed because how would zookeeper know which namenode is active or standby.

See something strange happened with my cluster.When connectivity is lost between two namenode machine In that case If last standby namenode is on same machine on which ambari is installed than it'll active that namenode as soon as connectivity lost.But now what happens when standby namenode is on other machine on which ambari isn't installed is strange. It remains always standby after connectivity lost. Am I missing something in HA cluster setup ?

Highlighted

Re: HDFS High availability

Could you reword your problem/ issue? It was tough to follow.

Highlighted

Re: HDFS High availability

You need 3 Zookeeper SERVERS and you Need 3 Journal Nodes for HDFS HA. Did you do the Name Node HA vii Ambari?

Highlighted

Re: HDFS High availability

@Ancil McBarnet

Yes I did this via Ambari, I have exact setup as you said in your answer, Please read my comment on @Edgar Daeds's answer. Thats what I am facing right now

Re: HDFS High availability

Its tough to explain the scenario though I am trying my best,

I have set up namenode HA using Ambari, My active namenode is on machine where ambari installed alongside zookeeper client & zookeeper server. Standby namenode is on another machine where zookeeper client is not installed but zookeeper server is installed. Now when n/w connectivity between this 2 machines goes offline, The Passive namenode don't turned into active mode or what should be the behaviour in this case as already 1 active namenode is there in cluster.

So my question is, zookeeper client is the one who tells the namenode to be active/passive?

Highlighted

Re: HDFS High availability

@Viraj Vekaria

Soo the zookeeperfailover controller tells the namenode to become active/passive. You should have three Journalnodes and two zookeeper failover controllers.

Now if they need zookeeper client installed? Not sure, they will not use the client command line utils ( zkCli.sh etc. ) But they will need zookeeper jars. However they might have them in their own lib folder. Or depend on the zookeeper client to provide them. I have seen both approaches.

Normally Ambari installs all needed clients during an install but it has been known to forget one before. So if you want to make sure, install the client from the host page. ( +Add button on the webpage of the host )

But I think its unlikely that this is the problem. You should see some Classnotfoundexceptions somewhere ( in the zookeeper failover controller logs )

http://hortonworks.com/blog/namenode-high-availability-in-hdp-2-0/

Highlighted

Re: HDFS High availability

@Viraj Vekaria Can you please post some screenshots of your ZooKeeper and Namenode configurations?

For High Availability you need a quorum (e.g. 3,5,7) of ZooKeeper andJournal Nodes.

Highlighted

Re: HDFS High availability

Guru

Zookeeper just helps to manage which Name Node is currently active.

Don't have an account?
Coming from Hortonworks? Activate your account here