I have specific requirements for applications I'm building that need to access remote clusters. By "remote" I mean a Hadoop cluster whose configuration is not stored in the local hdfs-site/core-site xml files on the server the application is hosted.
I've found the easiest way to connect to a remote cluster in Java is to just use the FileSystem api and pass in the active name node along with a configuration. However, it is bad practice, insecure, and unreliable to store Hadoop cluster configurations for a cluster in the code itself.
Is there a "right" way to be getting a remote cluster's active name node? Does Hadoop provide an API or something that I could call to get this information?
When you have Namenode HA enabled, you have what's called a "nameservice". You specify nameservice and let Hadoop configuration take care of connecting to whatever the active namenode is. You don't have to worry about which namenode is active in your client code. By the way, you should use client side configuration files to connect to the cluster.
You would specify the following in your hdfs-site.xml when you enable HA so you have a nameservice.
dfs.namenode.rpc-address.[nameservice ID].[name node ID] or
dfs.namenode.http-address.[nameservice ID].[name node ID]
check the following link.
To find the active namenode, we can try executing the test hdfs command on each of the namenodes and find the active name node corresponding to the successful run.
Below command executes successfully if the name node is active and fails if it is a standby node.
hadoop fs -test -e hdfs://<Name node>/
active_node='' if hadoop fs -test -e hdfs://<NameNode-1>/ ; then active_node='<NameNode-1>' elif hadoop fs -test -e hdfs://<NameNode-2>/ ; then active_node='<NameNode-2>' fi echo "Active Dev Name node : $active_node"
Could anyone please tell how can I get the core-site.xml, hdfs-site.xml. I am building a gradle project in which I have to create a directory on hdfs which is on a remote server.