Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Best practice to connect to remote HA hdfs cluster

Best practice to connect to remote HA hdfs cluster

New Contributor

Hi folks,

We have some cloud applications which connect to HA hortonworks backend. They connect to the usual suspects: hdfs, hbase, kafka, zk and the backend is kerberized. The cloud applications run under docker.

What is the best practice to connect here? Currently we make each of the cloud hosts part of the ambari cluster and mount the /etc/hadoop/conf config folders into the containers.. adding to the classpath. This seems rather kludgy and not very portable, but it allows the apps to instantiate Configuration(..) for hdfs.

Any better recommendations here or other approaches we could take?

What I'd be hoping for is better portability i.e.:

1) the docker hosts to be vanilla docker hosts, possibly even in an os that might be less-suited to ambari like e.g. coreOS

2) the ability for us to scale out to nodes in other clouds without having to add them to ambari first

3) Still avail of the HA features of hadoop namenodes.

Apologies if this is a dumb question..

thanks

Mark.

3 REPLIES 3

Re: Best practice to connect to remote HA hdfs cluster

Re: Best practice to connect to remote HA hdfs cluster

Thanks for the heads up @Namit Maheshwari. I don't have a better solution in mind than what @Mark Davis already described.

Re: Best practice to connect to remote HA hdfs cluster

New Contributor

The best way to connect to Hadoop Cluster as client server, register client server to ambari-server.

If you don't have same OS version between ambari-server and client server, then you should setup same version's hadoop library and config files in client server.

To handle easily for HA services like NameNode, ReourceManager, Hive,...etc, I'll recommend to use ZooKeeper Curatorframework.