Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is it possible to have one Hadoop client machine point to two Hadoop clusters?

Is it possible to have one Hadoop client machine point to two Hadoop clusters?

New Contributor

I would like to inject the data one Hadoop client machine, to two Hadoop clusters, for example, 1 for DEV and 1 for Production. Is it possible to configure the client server such that we can switch between the two clusters?

3 REPLIES 3
Highlighted

Re: Is it possible to have one Hadoop client machine point to two Hadoop clusters?

Mentor

@Stephanie Shen it is certainly possible but it'd be an edgenode not being managed by Ambari agent as Ambari will try to overwrite your configurations. What you can do is have multiple configuration directories and point each one of them to hdfs client. You can achieve that by exporting HADOOP_CONF_DIR global variable. I would also like to warn you that it's possible to make mistakes in such a set up as it's easy to mix up configurations and apply commands against environments you didn't intend to apply to. It's best to spin up a few lightweight nodes that you'd use for separate enironments or start using Ambari HDFS files view to interact with HDFS, that way you can have separate browser tab for each environment and complications with global variables goes away. We can discuss this further offline.

Re: Is it possible to have one Hadoop client machine point to two Hadoop clusters?

New Contributor

Hi Stephanie Shen ,

In my opinion, It is best to have two separate machines for each of cluster.

Regards,

Fahim

Re: Is it possible to have one Hadoop client machine point to two Hadoop clusters?

New Contributor

You can achieve it adding some parameters in hdfs-site on Ambari which is managing this client so the local cluster (including the client host) would know where to connect when using the clustername in commands that use hdfs protocol... It's not exactly what you asked but is an option to not having to make a different directory for hadoop conf and manage it by Ambari, if you only need to connect to the nn/dn to send data.

But I agree with the previous comments that the best and secure option to avoid mistakes is to have separated machines.

Assuming that you're connecting to a remote cluster with Namenode HA, configure the hdfs-site (on Ambari) with the parameters in the example bellow:

<name>dfs.client.failover.proxy.provider.REMOTECLUSTERNAME</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

<name>dfs.ha.namenodes.REMOTECLUSTERNAME</name>

<value>nn1,nn2</value>

<name>dfs.namenode.rpc-address.REMOTECLUSTERNAME.nn1</name> <value>namenode1ofremoteclusterhere:8020</value>

<name>dfs.namenode.rpc-address.REMOTECLUSTERNAME.nn2</name> <value>namenode2ofremoteclusterhere:8020</value>

<name>dfs.nameservices</name>

<value>LOCALCLUSTERNAMEALREADYCONFIGURED,REMOTECLUSTERNAME</value>

Reference : https://community.hortonworks.com/articles/40373/making-your-cluster-aware-of-multiple-namenode-ha.h...

Don't have an account?
Coming from Hortonworks? Activate your account here