Support Questions

Find answers, ask questions, and share your expertise

Figuring out the active name node of a remote Hadoop cluster.

avatar
New Contributor

I have specific requirements for applications I'm building that need to access remote clusters. By "remote" I mean a Hadoop cluster whose configuration is not stored in the local hdfs-site/core-site xml files on the server the application is hosted.

I've found the easiest way to connect to a remote cluster in Java is to just use the FileSystem api and pass in the active name node along with a configuration. However, it is bad practice, insecure, and unreliable to store Hadoop cluster configurations for a cluster in the code itself.

Is there a "right" way to be getting a remote cluster's active name node? Does Hadoop provide an API or something that I could call to get this information?

1 ACCEPTED SOLUTION

avatar
Rising Star
7 REPLIES 7

avatar
Super Guru

@William Bolton

When you have Namenode HA enabled, you have what's called a "nameservice". You specify nameservice and let Hadoop configuration take care of connecting to whatever the active namenode is. You don't have to worry about which namenode is active in your client code. By the way, you should use client side configuration files to connect to the cluster.

You would specify the following in your hdfs-site.xml when you enable HA so you have a nameservice.

dfs.nameservices

dfs.ha.namenodes.[nameservice ID]

dfs.namenode.rpc-address.[nameservice ID].[name node ID] or

dfs.namenode.http-address.[nameservice ID].[name node ID]

check the following link.

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.ht...

avatar
Rising Star

@William Bolton You can use JMX to get the HA state (tag.HAState)

For more information you can refer

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/ch07.html

avatar
Rising Star

avatar
Explorer

Try this option:

[serviceaccount@edgenode ~]$ hdfs getconf -confKey dfs.nameservices
hadoopcdhnn
[serviceaccount@edgenode ~]$ hdfs getconf -confKey dfs.ha.namenodes.hadoopcdhnn
namenode5605,namenode5456
[serviceaccount@edgenode ~]$ hdfs haadmin -getServiceState namenode5605
active
[serviceaccount@edgenode ~]$ hdfs haadmin -getServiceState namenode5456
standby

avatar

Hi @William Bolton, are these applications accessing HDFS directly? What's the mode of access e.g. WebHDFS REST API, Java APIs or something else?

avatar
New Contributor

To find the active namenode, we can try executing the test hdfs command on each of the namenodes and find the active name node corresponding to the successful run.

Below command executes successfully if the name node is active and fails if it is a standby node.

hadoop fs -test -e hdfs://<Name node>/

Unix script

active_node=''
if hadoop fs -test -e hdfs://<NameNode-1>/ ; then
active_node='<NameNode-1>'
elif hadoop fs -test -e hdfs://<NameNode-2>/ ; then
active_node='<NameNode-2>'
fi

echo "Active Dev Name node : $active_node"

avatar
New Contributor

Could anyone please tell how can I get the core-site.xml, hdfs-site.xml. I am building a gradle project in which I have to create a directory on hdfs which is on a remote server.