Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Best practice to connect to remote HA hdfs cluster

avatar
Explorer

Hi folks,

We have some cloud applications which connect to HA hortonworks backend. They connect to the usual suspects: hdfs, hbase, kafka, zk and the backend is kerberized. The cloud applications run under docker.

What is the best practice to connect here? Currently we make each of the cloud hosts part of the ambari cluster and mount the /etc/hadoop/conf config folders into the containers.. adding to the classpath. This seems rather kludgy and not very portable, but it allows the apps to instantiate Configuration(..) for hdfs.

Any better recommendations here or other approaches we could take?

What I'd be hoping for is better portability i.e.:

1) the docker hosts to be vanilla docker hosts, possibly even in an os that might be less-suited to ambari like e.g. coreOS

2) the ability for us to scale out to nodes in other clouds without having to add them to ambari first

3) Still avail of the HA features of hadoop namenodes.

Apologies if this is a dumb question..

thanks

Mark.

3 REPLIES 3

avatar

avatar

Thanks for the heads up @Namit Maheshwari. I don't have a better solution in mind than what @Mark Davis already described.

avatar
Rising Star

The best way to connect to Hadoop Cluster as client server, register client server to ambari-server.

If you don't have same OS version between ambari-server and client server, then you should setup same version's hadoop library and config files in client server.

To handle easily for HA services like NameNode, ReourceManager, Hive,...etc, I'll recommend to use ZooKeeper Curatorframework.