Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Can a datanode be shared between 2 separate Ambari Servers ?

avatar
Contributor

Is it possible to share the same datanode between 2 different clusters monitored by separate Ambari Server for each cluster ?

1 ACCEPTED SOLUTION

avatar
Master Guru

The answer to that is unfortunately no. The commands installed in /usr/bin all hard point to a config directory so you cannot have two different config directories ( Which you would need to connect to different clusters )

May I ask why you would want that.

View solution in original post

5 REPLIES 5

avatar
Master Guru

The answer to that is unfortunately no. The commands installed in /usr/bin all hard point to a config directory so you cannot have two different config directories ( Which you would need to connect to different clusters )

May I ask why you would want that.

avatar
Contributor

One of the customer is currently running a cluster and would like to create a separate cluster for research purpose and are looking into the feasibility to monitor 2 different cluster from one Ambari Server , which we mentioned that currently we do not support multi-cluster operations through Ambari, so now they are looking into feasibility of sharing the data nodes so that both the clusters can utilized the same data without moving the data across the clusters or replicating.

avatar
Master Guru

aaaah really the absolute same datanodes. So not even two different data folders and configs but the absolute same datanode with the same blocks pointing to two different Namenodes? How would you think that would work with file changes? Would those be merged?

Or do you want the same HDFS period monitored by two ambari?

I fail to even logically see how any of that could be possible. How about just implementing two Queues and have a research queue with a percentage of the cluster and perhaps a folder with a capacity as well? Sounds like about the same.

avatar
Contributor

Thanks Ben for the valuable information. You are right, we also suggested on using queues and setting up percentage of the cluster in the same cluster , but they are looking into having two separate cluster. So the only suggestion to them would be using HDF(Apache Nifi) to seamlessly replicate data into the second cluster as it is loaded into the first cluster, which will give them feasibility of having the same data at the same time in both the clusters.

avatar
Master Guru

Or distcp/falcon. It depends how you load the data as well, if that is well defined they could just duplicate the load. Nifi would come in in my opinion in very specific scenarios when you have control over the data source. There is also wandisco but that would be a big change.