Support Questions

Find answers, ask questions, and share your expertise

Accessing HDFS in Namenode HA environment

avatar

In HA environment, we can access HDFS by referring the active namenode directory but I am interested in finding if there is a way to access HDFS using nameservice id such that if and when the HDFS fails over to the passive namenode, then the client can just continue to use HDFS without manually changing the configuration.

1 ACCEPTED SOLUTION

avatar

In any file system path, you can use the HDFS logical nameservice that spans the two NameNodes in an HA pair. For example, assuming dfs.nameservices is set to "mycluster", you use file system paths like hdfs://mycluster/path/to/file.

In typical deployments, this is covered by setting configuration property fs.defaultFS in core-site.xml to the logical nameservice, i.e. hdfs://mycluster. That way, any application that refers to a bare path without scheme or authority, such as hdfs dfs -ls /path, will automatically be resolved against the default file system.

More details on HA configuration are available in the Apache documentation:

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.htm...

This includes further discussion of configuring clients correctly for failover.

Also, I expect this configuration is fully automated already for Ambari deployments of HDFS HA.

View solution in original post

2 REPLIES 2

avatar

In any file system path, you can use the HDFS logical nameservice that spans the two NameNodes in an HA pair. For example, assuming dfs.nameservices is set to "mycluster", you use file system paths like hdfs://mycluster/path/to/file.

In typical deployments, this is covered by setting configuration property fs.defaultFS in core-site.xml to the logical nameservice, i.e. hdfs://mycluster. That way, any application that refers to a bare path without scheme or authority, such as hdfs dfs -ls /path, will automatically be resolved against the default file system.

More details on HA configuration are available in the Apache documentation:

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.htm...

This includes further discussion of configuring clients correctly for failover.

Also, I expect this configuration is fully automated already for Ambari deployments of HDFS HA.

avatar
Rising Star

In an HA environment, you should always refer to the nameservice, not any one of the namenodes. The syntax for the URL is

hdfs://<nameservice>/

Notice that no port number is specified. The HA configuration should be defined in /etc/hadoop/conf/core-site.xml and accessible by the process.

WebHDFS does not natively support Namenode HA but you can use Knox to provide the functionality.