New Contributor
Posts: 3
Registered: ‎04-13-2015

confused about HDFS HA using zookeeper & QJM

Hi everyone,

I am new to Hadoop and Cloudera Manager. I have several question related to hadoop that i want to ask. I hope all of you could help me to get better understanding hadoop especially Cloudera. Also i only add 3 services in Cloudera Manager, there are HDFS, MapReduce, & Zookeeper because i only need to focus on my research topic which is HDFS High Availability.

Theses are the roles that i put in each node :

  1. Master node
  • HDFS Balancer
  • HDFS JournalNode
  • HDFS NameNode
  • HDFS SecondaryNameNode
  • HDFS HttpFS
  • MapReduce JobTracker
  • Zookeeper server

  2.Standby node

  • HDFS JournalNode
  • HDFS HttpFS
  • MapReduce JobTracker
  • Zookeeper server

  3. Slave nodes

  • HDFS DataNode
  • HDFS HttpFS
  • HDFS Gateway
  • HDFS Failover Controller
  • MapReduce Failover Controller
  • MapReduce TaskTracker

These are my question :

  1. How cloudera manager (CM) assign roles to each node? i mean if CM deploy each services in every slave nodes then is that mean all the roles in service has been deployed? is it just inactive or something? then when i assign roles, is it being active?For example, i add HDFS service in 4 different kind node (1 master, 3 slaves). Is that mean all roles including namenode, datanode, etc have been deployed too but not being active? then when i assign the roles using CM, is it being active? or is there any other explanation about how CM assign roles to each node?
  2. When i read the documentation about journal node & zookeeper, it told me that i need to deploy those two in odd node min. 3 nodes for 1 failover. It is recommended that i put a JournalNode each on the same hosts as the Active and Standby NameNodes, and the third JournalNode on similar hardware, such as the JobTracker. What does it mean? i dont quite understand which node i should put Journal node. I put it in Active & standby namenode but i dont know where i should put the third Journal node. Because i put Jobtracker in the same Node like Active namenode which is Master node. So where I should put the last journalnode & zookeeper?
  3. what is the function of HDFS HttpFS, balancer, failover controller & gateway? i see it in HDFS service. Is it necessary to the cluster? my cluster is just a simple Hadoop Cluster


Thanks in advance. Any help much appreciated.