Hi to all, I would like to setup a demo cluster with 3 nodes and some slaves. I used the ambari automatic installation guide.
Firstly I would like to ask through ambari web setup at the point of install options and confirm hosts should be there only the 3 masters and slaves without ambari, or do I have to add ambari as a master. I read that metrics collector should be installed on ambari.
Secondly, at the point of assign components to masters, ambari automatically assigns components based on three master
is there any best-good practice for this assignment ?
and thirdly I created an extra equal partition on all masters and slaves for hdfs use, mounted as /grid/1
Where do I specify this partition for hdfs use ?
Its your choice if you want the host where ambari-server is installed to also be a member of the cluster. Considering its a 3 node demo cluster I would suggest including the host where ambari-server is running.
You can rely on the default assignment.
In the hdfs config page, during install, you can modify the default values - e.g. screen-shot-2016-04-17-at-101556-am.png
You plan to use three masters and some datanodes in addition to that? VMs or real physical machines? I.e. do you have servers ( 6-12 cores, 64GB+ ) or are these restricted VMs for a demo. The answers will be different.
For powerful servers I would normally do:
Master1 ( HDFS1) : Namenode1, QJM, Zookeeper
Master2 ( HDFS2) : Namenode2, QJM, Zookeeper
Master3: QJM, Zookeeper, ResourceManager ( plus other mapred stuff ) , Hive, Oozie, everything else
The most resource hungry and crucial component in a cluster is the Namenode. It needs a lot of RAM and CPU and if it is impacted everything else will die. So it is normally good to put them on dedicated nodes. The resourcemanager is the third "heavy" component so its good to put it on the third node.
But this is not the last spoken word. If you are memory restricted you can also distribute services more equally. Also if you want oozie,resourcemanager or hive HA you should distribute them equally across Master1+2.
The important thing is to do some decent RAM planning to make sure every service has enough and you do not run over the total RAM of the cluster.
"should be there only the 3 masters and slaves without ambari, or do I have to add ambari as a master. I read that metrics collector should be installed on ambari."
On a demo cluster like this I would put ambari on one of the master nodes. Ambari on a cluster of that size shouldn't take much resources. Where did you read that the Metrics collector should be colocated with ambari? I don't think this is a requirement since Metrics collector has a REST interface. They will communicate a lot that is true but a local network should be fast enough.
"and thirdly I created an extra equal partition on all masters and slaves for hdfs use, mounted as /grid/1"
So you only have one disc? Normally the slaves would have a couple local drives mapped on
while the masters have a single raided disc with a partition for example
containing fsimage data etc.
Now you can obviously call the partition /grid/1 on the masters as well there is nothing that stops that.
Hi @sotak , it is not really clear to me if you want to use 3 nodes in total or to have 3 master nodes and some slaves in addition as seperate nodes.
This influences the layout of your cluster dramatically.
In addition to that, it is important which components you are going to use (Hive, Spark, HBase, ....) due to different requirements.
If you have 3 dedicated master nodes, I agree on @Benjamin Leonhardi 's suggestion as a starting point and adjust it accordingly if you want to have more HA functionality (Resourcemanager, Hiveserver) or if you need HBase as well (single HMaster, or also HMaster HA).
For the master nodes you do not need a dedicated partition for HDFS usage, the masters do not store HDFS data at all (usually), use that mountpoint for the Namenode's metadata on those nodes instead (dfs.namenode.name.dir). For the slave nodes specify the directory for HDFS usage in property dfs.datanode.data.dir
Thank you all for your help, i decided the ambari server should stay out of the cluster. So i have 3 masters (NameNode, ResourceManager and HbaseMaster) and some vms as slaves. Now i didn't understand well the best-right way to deploy hdfs inside the cluster, masters should have their own partition-disk and the slaves another ? should be equal partitions-disks ? can you give an example ? if there is a storage can i assign a equal lun to all masters and slaves ? also as you said is there different configuration file for hdfs for masters and slaves ?
RE: Secondly, at the point of assign components to masters, ambari automatically assigns components based on three master is there any best-good practice for this assignment ?
Ambari automatically recommends the layout of components based on your environmen. So for a demo cluster, I would say you dont really need to change the layout.
so, there is no need to have specific disk for hdfs usage in masters nodes.
For slaves nodes, should be equal size of disks ?
can you give an example or is there any document that describes the cluster setup with more details ?
thanks, at the point of selecting services for slaves there is a default check in the client box only for the first and the last slave of the cluster. Is there any reason ?