Created 10-22-2015 08:20 PM
In HA environment, we can access HDFS by referring the active namenode directory but I am interested in finding if there is a way to access HDFS using nameservice id such that if and when the HDFS fails over to the passive namenode, then the client can just continue to use HDFS without manually changing the configuration.
Created 10-22-2015 08:26 PM
In any file system path, you can use the HDFS logical nameservice that spans the two NameNodes in an HA pair. For example, assuming dfs.nameservices is set to "mycluster", you use file system paths like hdfs://mycluster/path/to/file.
In typical deployments, this is covered by setting configuration property fs.defaultFS in core-site.xml to the logical nameservice, i.e. hdfs://mycluster. That way, any application that refers to a bare path without scheme or authority, such as hdfs dfs -ls /path, will automatically be resolved against the default file system.
More details on HA configuration are available in the Apache documentation:
This includes further discussion of configuring clients correctly for failover.
Also, I expect this configuration is fully automated already for Ambari deployments of HDFS HA.
Created 10-22-2015 08:26 PM
In any file system path, you can use the HDFS logical nameservice that spans the two NameNodes in an HA pair. For example, assuming dfs.nameservices is set to "mycluster", you use file system paths like hdfs://mycluster/path/to/file.
In typical deployments, this is covered by setting configuration property fs.defaultFS in core-site.xml to the logical nameservice, i.e. hdfs://mycluster. That way, any application that refers to a bare path without scheme or authority, such as hdfs dfs -ls /path, will automatically be resolved against the default file system.
More details on HA configuration are available in the Apache documentation:
This includes further discussion of configuring clients correctly for failover.
Also, I expect this configuration is fully automated already for Ambari deployments of HDFS HA.
Created 10-22-2015 08:34 PM
In an HA environment, you should always refer to the nameservice, not any one of the namenodes. The syntax for the URL is
hdfs://<nameservice>/
Notice that no port number is specified. The HA configuration should be defined in /etc/hadoop/conf/core-site.xml and accessible by the process.
WebHDFS does not natively support Namenode HA but you can use Knox to provide the functionality.