Hi everyone,
I am new to Hadoop and Cloudera Manager. I have several question related to hadoop that i want to ask. I hope all of you could help me to get better understanding hadoop especially Cloudera. Also i only add 3 services in Cloudera Manager, there are HDFS, MapReduce, & Zookeeper because i only need to focus on my research topic which is HDFS High Availability.
Theses are the roles that i put in each node :
- Master node
- HDFS Balancer
- HDFS JournalNode
- HDFS NameNode
- HDFS SecondaryNameNode
- HDFS HttpFS
- MapReduce JobTracker
- Zookeeper server
2.Standby node
- HDFS JournalNode
- HDFS HttpFS
- MapReduce JobTracker
- Zookeeper server
3. Slave nodes
- HDFS DataNode
- HDFS HttpFS
- HDFS Gateway
- HDFS Failover Controller
- MapReduce Failover Controller
- MapReduce TaskTracker
These are my question :
- How cloudera manager (CM) assign roles to each node? i mean if CM deploy each services in every slave nodes then is that mean all the roles in service has been deployed? is it just inactive or something? then when i assign roles, is it being active?For example, i add HDFS service in 4 different kind node (1 master, 3 slaves). Is that mean all roles including namenode, datanode, etc have been deployed too but not being active? then when i assign the roles using CM, is it being active? or is there any other explanation about how CM assign roles to each node?
- When i read the documentation about journal node & zookeeper, it told me that i need to deploy those two in odd node min. 3 nodes for 1 failover. It is recommended that i put a JournalNode each on the same hosts as the Active and Standby NameNodes, and the third JournalNode on similar hardware, such as the JobTracker. What does it mean? i dont quite understand which node i should put Journal node. I put it in Active & standby namenode but i dont know where i should put the third Journal node. Because i put Jobtracker in the same Node like Active namenode which is Master node. So where I should put the last journalnode & zookeeper?
- what is the function of HDFS HttpFS, balancer, failover controller & gateway? i see it in HDFS service. Is it necessary to the cluster? my cluster is just a simple Hadoop Cluster
Thanks in advance. Any help much appreciated.