Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Figuring out the active name node of a remote Hadoop cluster.

Solved Go to solution

Figuring out the active name node of a remote Hadoop cluster.

New Contributor

I have specific requirements for applications I'm building that need to access remote clusters. By "remote" I mean a Hadoop cluster whose configuration is not stored in the local hdfs-site/core-site xml files on the server the application is hosted.

I've found the easiest way to connect to a remote cluster in Java is to just use the FileSystem api and pass in the active name node along with a configuration. However, it is bad practice, insecure, and unreliable to store Hadoop cluster configurations for a cluster in the code itself.

Is there a "right" way to be getting a remote cluster's active name node? Does Hadoop provide an API or something that I could call to get this information?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Figuring out the active name node of a remote Hadoop cluster.

Contributor
6 REPLIES 6

Re: Figuring out the active name node of a remote Hadoop cluster.

Super Guru

@William Bolton

When you have Namenode HA enabled, you have what's called a "nameservice". You specify nameservice and let Hadoop configuration take care of connecting to whatever the active namenode is. You don't have to worry about which namenode is active in your client code. By the way, you should use client side configuration files to connect to the cluster.

You would specify the following in your hdfs-site.xml when you enable HA so you have a nameservice.

dfs.nameservices

dfs.ha.namenodes.[nameservice ID]

dfs.namenode.rpc-address.[nameservice ID].[name node ID] or

dfs.namenode.http-address.[nameservice ID].[name node ID]

check the following link.

https://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithNFS.ht...

Highlighted

Re: Figuring out the active name node of a remote Hadoop cluster.

Contributor

@William Bolton You can use JMX to get the HA state (tag.HAState)

For more information you can refer

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2/bk_hdfs_admin_tools/content/ch07.html

Re: Figuring out the active name node of a remote Hadoop cluster.

Contributor

Re: Figuring out the active name node of a remote Hadoop cluster.

Hi @William Bolton, are these applications accessing HDFS directly? What's the mode of access e.g. WebHDFS REST API, Java APIs or something else?

Re: Figuring out the active name node of a remote Hadoop cluster.

New Contributor

To find the active namenode, we can try executing the test hdfs command on each of the namenodes and find the active name node corresponding to the successful run.

Below command executes successfully if the name node is active and fails if it is a standby node.

hadoop fs -test -e hdfs://<Name node>/

Unix script

active_node=''
if hadoop fs -test -e hdfs://<NameNode-1>/ ; then
active_node='<NameNode-1>'
elif hadoop fs -test -e hdfs://<NameNode-2>/ ; then
active_node='<NameNode-2>'
fi

echo "Active Dev Name node : $active_node"

Re: Figuring out the active name node of a remote Hadoop cluster.

New Contributor

Could anyone please tell how can I get the core-site.xml, hdfs-site.xml. I am building a gradle project in which I have to create a directory on hdfs which is on a remote server.

Don't have an account?
Coming from Hortonworks? Activate your account here