Reply
Explorer
Posts: 8
Registered: ‎11-24-2017

Explanation of Cloudera architecture on cloud (Azure)

Hello everyone! I am new to Hadoop/Cloudera world, I need to setup a Cloudera cluster on Microsoft Azure cloud.

If I understood correctly there are two methods to install Cloudera on a cluster: using Cloudera Manager or thorugh a manual installation.

According to this schema it seems it is needed a dedicated machine for Cloudera Manager and 3 Master Nodes.

 

cloudera_azure.png

 

But in this table it seems I can install Cloudera Manager directly on the Master Node.

 

cloudera_azure_roles.png

 

So here are my doubts/questions:

 

1) Is it necessary to have Cloudera Manager in a dedicated machine (if yes, why)? Or can it be installed directly on the master node?

2) Why there are 3 master nodes? From what I understood, 2 master nodes can be used for high availability (they are the mirror of each other with the same configuration and services and can used for an hot switch). What is the purpose of the third master node and why it is different from the other two?

3) What is the purpose of the Cloudera Director and which are the differences from the Cloudera Managera? I've read that it can be used for automated deployments to the cloud but it is not clear to me for what exactly I could use it.

 

Thanks in advance for any information. 

 

Highlighted
Expert Contributor
Posts: 152
Registered: ‎07-01-2015

Re: Explanation of Cloudera architecture on cloud (Azure)

Master nodes: Yes the HDFS has two master roles, but the HDFS, YARN, Hbase, Failover controller and many other applications depends on Zookeeper. And this has to be deployed on 3 nodes. ZK is then voting who should be the leader, and therefore you need and odd number. It can be 1,3,5,7 and so on. But if you choose just one ZK, then if it fails, everything fails.  Choosing 3 gives you the possibility to tolerate 1 node failure. Choosing 5, 2 nodes failures and so on.

 

Cloudera Manager -> yes it can be on one of those masters, specially on that one where HDFS is not deployed. But in "complex" clusters, where there is a Hbase master, Kudu master, Sentry, Hive, etc etc, there are so many master roles that it is recommended to put CM to a different machine. CM can eat a lot of CPU and IO, because it collects lots of data, for charts/reports. 

And if you have many clusters, you can have one CM, as a management node and many masters/clusters.

 

Cloudra Directory is just a setup node. I will not advise to keep it up, I would just use it for the deployemnt and thats it. Because later on you realize that many of those changes what you need to do one the cluster are not covered by Cloudera Director. So it gets "unsychronized"..

 

 

Explorer
Posts: 8
Registered: ‎11-24-2017

Re: Explanation of Cloudera architecture on cloud (Azure)

Thank you very much for the answer!

I am still a bit confused about Cloudera Director. I've just configured a cluster on Microsoft Azure using Cloudera Manager. Through CM I've installed all the CDH components and services and the cluster is up and running. Thus if I can use the Cloudera Manager to setup and configure all the CDH components and the cluster, for what I should use the Director?

 

Expert Contributor
Posts: 152
Registered: ‎07-01-2015

Re: Explanation of Cloudera architecture on cloud (Azure)

Director is good for you if you dont want to mess up with VM provisioning, network settings, repo preparation, Java deployment. And also usefull that you can extend or make a exact copy of the cluster. This is something what you cannot do when you deployed everything from CM.
Explorer
Posts: 8
Registered: ‎11-24-2017

Re: Explanation of Cloudera architecture on cloud (Azure)

Ok, a bit more clear. From what I can see here there are 3 methods of installation for Cloudera (path A/B/C) and in none of them the Director is used. Is there any resource/docs/tutorial that explains how to perform an installation of a Cloudera-cluster using Director?

 

Thanks for the help

Announcements