Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Not able to access Accessing Azure BLOB Storage from HDFS , even after adding the access key

avatar
Explorer

Hi Team,

We are trying to access Azure BLOB storage from hdfs and somehow we are unable to do this at the moment .

We have secured environment with proxies, so any outgoing traffic passes through the proxy, i already whitelisted the blob URL and i can access and upload files into BLOB storage from local linux system on the same machine where hadoop is installed.

However when i try to access to azure BLOB storage with hdfs command, it just stucks and does not give any error.

Following is the command and the output:

hdfs dfs -ls wasbs://xxxx@xxxxxxxx.blob.core.windows.net/

it get stuck after these steps :

16/12/05 15:45:57 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties

16/12/05 15:45:57 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 60 second(s).

16/12/05 15:45:57 INFO impl.MetricsSystemImpl: azure-file-system metrics system started

Even enabling debug:

export HADOOP_ROOT_LOGGER=DEBUG,console

18/10/29 10:49:47 DEBUG util.Shell: setsid exited with exit code 0

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: i :: Ignore failures during copy ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: p [ARG] :: preserve status (rbugpcaxt)(replication, block-size, user, group, permission, checksum-type, ACL, XATTR, timestamps). If -p is specified with no <arg>, then preserves replication, block size, user, group, permission, checksum type and timestamps. raw.* xattrs are preserved when both the source and destination paths are in the /.reserved/raw hierarchy (HDFS only). raw.* xattrpreservation is independent of the -p flag. Refer to the DistCp documentation for more details. ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: update :: Update target, copying only missingfiles or directories ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: delete :: Delete from target, files missing in source ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: mapredSslConf [ARG] :: Configuration for ssl config file, to use with hftps://. Must be in the classpath. ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: numListstatusThreads [ARG] :: Number of threads to use for building file listing (max 40). ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: m [ARG] :: Max number of concurrent maps to use for copy ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: f [ARG] :: List of files that need to be copied ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: atomic :: Commit all changes or none ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: tmp [ARG] :: Intermediate work path to be used for atomic commit ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: log [ARG] :: Folder on DFS where distcp execution logs are saved ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: v :: Log additional info (path, size) in the SKIP/COPY log ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: strategy [ARG] :: Copy strategy to use. Default is dividing work based on file sizes ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: skipcrccheck :: Whether to skip CRC checks between source and target paths. ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: overwrite :: Choose to overwrite target files unconditionally, even if they exist. ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: append :: Reuse existing data in target files and append new data to them if possible ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: diff [ARG...] :: Use snapshot diff report to identify the difference between source and target ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: async :: Should distcp execution be blocking ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: filelimit [ARG] :: (Deprecated!) Limit number of files copied to <= n ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: sizelimit [ARG] :: (Deprecated!) Limit number of files copied to <= n bytes ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: bandwidth [ARG] :: Specify bandwidth per map in MB ]

18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: filters [ARG] :: The path to a file containing a list of strings for paths to be excluded from the copy. ]

18/10/29 10:49:47 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true

18/10/29 10:49:47 DEBUG security.Groups: Creating new Groups object

18/10/29 10:49:47 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...

18/10/29 10:49:47 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library

18/10/29 10:49:47 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution

18/10/29 10:49:47 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping

18/10/29 10:49:48 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000

18/10/29 10:49:48 DEBUG security.UserGroupInformation: hadoop login

18/10/29 10:49:48 DEBUG security.UserGroupInformation: hadoop login commit

18/10/29 10:49:48 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: svc_hdfs

18/10/29 10:49:48 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: svc_hdfs" with name svc_hdfs

18/10/29 10:49:48 DEBUG security.UserGroupInformation: User entry: "svc_hdfs"

18/10/29 10:49:48 DEBUG security.UserGroupInformation: Assuming keytab is managed externally since logged in from subject.

18/10/29 10:49:48 DEBUG security.UserGroupInformation: UGI loginUser:svc_hdfs (auth:SIMPLE)

18/10/29 10:49:48 DEBUG gcs.GoogleHadoopFileSystemBase: GHFS version: 1.8.1.2.6.5.0-292

18/10/29 10:49:48 DEBUG configuration.ConfigurationUtils: ConfigurationUtils.locate(): base is null, name is hadoop-metrics2-azure-file-system.properties

18/10/29 10:49:48 DEBUG configuration.ConfigurationUtils: ConfigurationUtils.locate(): base is null, name is hadoop-metrics2.properties

18/10/29 10:49:48 DEBUG configuration.ConfigurationUtils: Loading configuration from the context classpath (hadoop-metrics2.properties)

18/10/29 10:49:48 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties

18/10/29 10:49:48 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).

18/10/29 10:49:48 INFO impl.MetricsSystemImpl: azure-file-system metrics system started

18/10/29 10:49:48 DEBUG azure.AzureNativeFileSystemStore: AzureNativeFileSystemStore init. Settings=8,false,90,{3000,3000,30000,30},{true,1.0,1.0}

18/10/29 10:49:48 DEBUG azure.AzureNativeFileSystemStore: Page blob directories:

18/10/29 10:49:48 DEBUG azure.AzureNativeFileSystemStore: Block blobs with compaction directories:

18/10/29 10:49:48 DEBUG azure.AzureNativeFileSystemStore: Atomic rename directories: /hbase

18/10/29 10:49:48 DEBUG azure.NativeAzureFileSystem: NativeAzureFileSystem. Initializing.

18/10/29 10:49:48 DEBUG azure.NativeAzureFileSystem: blockSize = 536870912

18/10/29 10:49:48 DEBUG azure.NativeAzureFileSystem: Getting the file status for wasbs:// /user

18/10/29 10:49:48 DEBUG azure.AzureNativeFileSystemStore: Retrieving metadata for user

18/10/29 10:49:48 DEBUG azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: SendingRequest: threadId=1, requestType=read , isFirstRequest=true, sleepDuration=0

Can anyone please help on this .Let me know if any additional configuration is required .

Regards,

Vishal

2 REPLIES 2

avatar
New Contributor

Hi,

 

One reason may be that the clusters on which the mapReduce is running does not manage resolving the Azure storage DNS.

You need to update the host file (whatever on Windows or Linux) to add:

IP_Of_The_Storage DNS_Name_Of_The_Storage

 

It has probably be done on the machine you're runnig the command, but all the machine composing the cluter should be able to access and resolve the DNS.

avatar
Master Mentor

@vishal6193 

Can you share how you did your ADLS setup? Please have a look at this Hadoop Azure Support Azure Blob Storage  look at the jars and credentials to create in a secure environment