Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Accessing HDFS in Namenode HA environment

Solved Go to solution

Accessing HDFS in Namenode HA environment

In HA environment, we can access HDFS by referring the active namenode directory but I am interested in finding if there is a way to access HDFS using nameservice id such that if and when the HDFS fails over to the passive namenode, then the client can just continue to use HDFS without manually changing the configuration.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Accessing HDFS in Namenode HA environment

In any file system path, you can use the HDFS logical nameservice that spans the two NameNodes in an HA pair. For example, assuming dfs.nameservices is set to "mycluster", you use file system paths like hdfs://mycluster/path/to/file.

In typical deployments, this is covered by setting configuration property fs.defaultFS in core-site.xml to the logical nameservice, i.e. hdfs://mycluster. That way, any application that refers to a bare path without scheme or authority, such as hdfs dfs -ls /path, will automatically be resolved against the default file system.

More details on HA configuration are available in the Apache documentation:

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.htm...

This includes further discussion of configuring clients correctly for failover.

Also, I expect this configuration is fully automated already for Ambari deployments of HDFS HA.

2 REPLIES 2

Re: Accessing HDFS in Namenode HA environment

In any file system path, you can use the HDFS logical nameservice that spans the two NameNodes in an HA pair. For example, assuming dfs.nameservices is set to "mycluster", you use file system paths like hdfs://mycluster/path/to/file.

In typical deployments, this is covered by setting configuration property fs.defaultFS in core-site.xml to the logical nameservice, i.e. hdfs://mycluster. That way, any application that refers to a bare path without scheme or authority, such as hdfs dfs -ls /path, will automatically be resolved against the default file system.

More details on HA configuration are available in the Apache documentation:

http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.htm...

This includes further discussion of configuring clients correctly for failover.

Also, I expect this configuration is fully automated already for Ambari deployments of HDFS HA.

Highlighted

Re: Accessing HDFS in Namenode HA environment

Contributor

In an HA environment, you should always refer to the nameservice, not any one of the namenodes. The syntax for the URL is

hdfs://<nameservice>/

Notice that no port number is specified. The HA configuration should be defined in /etc/hadoop/conf/core-site.xml and accessible by the process.

WebHDFS does not natively support Namenode HA but you can use Knox to provide the functionality.