Support Questions

Find answers, ask questions, and share your expertise

Why hdfs://mycluster/ different from /

avatar
Expert Contributor

We have HDInsight cluster setup in Azure.

When I do hadoop fs -ls / it shows me

drwxr-xr-x   - root    supergroup          0 2016-06-17 20:56 /HdiNotebooks
drwxr-xr-x   - root    supergroup          0 2016-06-17 21:00 /HdiSamples
drwxr-xr-x   - hdfs    supergroup          0 2016-06-17 20:48 /ams
drwxr-xr-x   - hdfs    supergroup          0 2016-06-17 20:48 /amshbase
drwxrwxrwx   - yarn    hadoop              0 2016-06-17 20:48 /app-logs
drwxr-xr-x   - yarn    hadoop              0 2016-06-17 20:48 /atshistory
drwxr-xr-x   - sshuser supergroup          0 2016-06-21 18:38 /data
drwxr-xr-x   - root    supergroup          0 2016-06-17 20:59 /example
drwxr-xr-x   - hdfs    supergroup          0 2016-06-17 20:48 /hdp
drwxr-xr-x   - hdfs    supergroup          0 2016-06-17 20:48 /hive
drwxr-xr-x   - mapred  supergroup          0 2016-06-17 20:48 /mapred
drwx------   - sshuser supergroup          0 2016-06-20 14:22 /mapreducestaging
drwxrwxrwx   - mapred  hadoop              0 2016-06-17 20:48 /mr-history
drwxr-xr-x   - sshuser supergroup          0 2016-06-20 19:20 /sqoop
drwxrwxrwx   - hdfs    supergroup          0 2016-06-17 20:48 /tmp
drwxr-xr-x   - hdfs    supergroup          0 2016-06-17 20:48 /user


But hadoop fs -ls hdfs://mycluster/

shows following result.

root@hn0-haspar:~# hadoop fs -ls hdfs://mycluster/
Found 3 items
drwxr-xr-x   - root hdfs          0 2016-06-21 18:48 hdfs://mycluster/data
drwx-wx-wx   - root hdfs          0 2016-06-17 20:57 hdfs://mycluster/tmp
drwx------   - root hdfs          0 2016-06-22 17:24 hdfs://mycluster/user


Dont know where this different dir coming.

Cluster has HA configuration.

1 ACCEPTED SOLUTION

avatar

@roy p, in an HDInsight cluster, the default file system is WASB, which is a Hadoop-compatible file system backed by Azure Storage. The default file system is defined by property fs.defaultFS in core-site.xml. In an HDInsight cluster, you'll see this property set to a "wasb:" URI.

When running Hadoop FileSystem Shell commands, if the path is not a qualified URI naming the scheme of the file system, then it assumes that you want the default file system. Thus, running "hadoop fs -ls /" shows results from the WASB file system as persisted in Azure Storage.

HDInsight clusters also run a local instance of HDFS as a supplementary, non-default file system. For a file system that is not the default, the shell commands may reference paths in that file system by qualifying the URI with the scheme. Thus, running "hadoop fs -ls hdfs://mycluster/" shows results from the local HDFS file system, even though WASB is the default file system in an HDInsight cluster.

Since the two commands reference paths on two different file systems, each containing its own set of files, the final results displayed are different.

View solution in original post

4 REPLIES 4

avatar

@roy p, in an HDInsight cluster, the default file system is WASB, which is a Hadoop-compatible file system backed by Azure Storage. The default file system is defined by property fs.defaultFS in core-site.xml. In an HDInsight cluster, you'll see this property set to a "wasb:" URI.

When running Hadoop FileSystem Shell commands, if the path is not a qualified URI naming the scheme of the file system, then it assumes that you want the default file system. Thus, running "hadoop fs -ls /" shows results from the WASB file system as persisted in Azure Storage.

HDInsight clusters also run a local instance of HDFS as a supplementary, non-default file system. For a file system that is not the default, the shell commands may reference paths in that file system by qualifying the URI with the scheme. Thus, running "hadoop fs -ls hdfs://mycluster/" shows results from the local HDFS file system, even though WASB is the default file system in an HDInsight cluster.

Since the two commands reference paths on two different file systems, each containing its own set of files, the final results displayed are different.

avatar
Expert Contributor

Thanks @Chris Nauroth for explanation. At present We have Namenode HA and we are putting data from Flume into this cluster. We are configuring hdfs://mycluster/flume as destination in Flume sink.

Whats is the correct way to put data into default HDFS storage (WASB) from Flume and make it accessible from hadoop fs -ls / ?

Appreciate help in this.

avatar
@roy p, if you want to route the data from Flume to WASB instead of HDFS, then I expect you can achieve that by changing the "hdfs:" URI to a "wasb:" URI. The full WASB URI will have an authority component that references an Azure Storage account and a container within that account. You can get the WASB URI by looking at configuration property fs.defaultFS in core-site.xml. If that doesn't work, then I recommend creating a new question specifically asking how to configure Flume to write to a file system different from HDFS. Please also apply the "flume" tag to the question. That will help get attention from Flume experts.

avatar
Expert Contributor

Thanks, it worked.