we run a cluster with 2 namenodes in HA mode and some datanodes
and also HBase with 2 HBase master and some regionserver installed on the datanodes.
On a separate master host I have the Ambari Metrics Collector running in distributed mode.
timeline.metrics.service.operation.mode = distributed
hbase.cluster.distributed = true
hbase.rootdir = hdfs://cluster/apps/ams/metrics
I only have "HBase Client" and "HDFS Client" component installed on that node.
Ambari gives me warning that I should install datanode components on that node:
#"It's recommended to install Datanode component on host.domain.tld to speed up IO operations between HDFS and Metrics Collector in distributed mode"#
On https://cwiki.apache.org/confluence/display/AMBARI/AMS+-+distributed+mode I found the following statement:
#"Note: Make sure there is a local Datanode hosted with the Collector, it provides AMS HBase the distinct advantage of write and reads sharded across the data volumes available to the DN."#
What does that mean? Should I install a datanode without
any data disk configured on the node running Ambari Metrics Collector, or should I move the
"Ambari Metrics Collector" on an existing datanode?
I there a best practice how to distribute the services on the cluster hosts?
the best way to distribute your services is using 4 machines in your case
just add the services and ambari will distribute them thru the 4 machines.
first machine : primary namenode hbase zookeeper ...
second machine : secondary namenode and any services you want (hive for example)
third machine : ambari metrics collector
forth machine : data node (check client box in installation steps)
answer is tested