Created on 10-29-2018 07:39 PM - last edited on 12-02-2019 05:50 AM by cjervis
Hi Team,
We are trying to access Azure BLOB storage from hdfs and somehow we are unable to do this at the moment .
We have secured environment with proxies, so any outgoing traffic passes through the proxy, i already whitelisted the blob URL and i can access and upload files into BLOB storage from local linux system on the same machine where hadoop is installed.
However when i try to access to azure BLOB storage with hdfs command, it just stucks and does not give any error.
Following is the command and the output:
hdfs dfs -ls wasbs://xxxx@xxxxxxxx.blob.core.windows.net/
it get stuck after these steps :
16/12/05 15:45:57 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
16/12/05 15:45:57 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
16/12/05 15:45:57 INFO impl.MetricsSystemImpl: azure-file-system metrics system started
Even enabling debug:
export HADOOP_ROOT_LOGGER=DEBUG,console
18/10/29 10:49:47 DEBUG util.Shell: setsid exited with exit code 0
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: i :: Ignore failures during copy ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: p [ARG] :: preserve status (rbugpcaxt)(replication, block-size, user, group, permission, checksum-type, ACL, XATTR, timestamps). If -p is specified with no <arg>, then preserves replication, block size, user, group, permission, checksum type and timestamps. raw.* xattrs are preserved when both the source and destination paths are in the /.reserved/raw hierarchy (HDFS only). raw.* xattrpreservation is independent of the -p flag. Refer to the DistCp documentation for more details. ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: update :: Update target, copying only missingfiles or directories ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: delete :: Delete from target, files missing in source ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: mapredSslConf [ARG] :: Configuration for ssl config file, to use with hftps://. Must be in the classpath. ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: numListstatusThreads [ARG] :: Number of threads to use for building file listing (max 40). ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: m [ARG] :: Max number of concurrent maps to use for copy ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: f [ARG] :: List of files that need to be copied ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: atomic :: Commit all changes or none ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: tmp [ARG] :: Intermediate work path to be used for atomic commit ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: log [ARG] :: Folder on DFS where distcp execution logs are saved ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: v :: Log additional info (path, size) in the SKIP/COPY log ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: strategy [ARG] :: Copy strategy to use. Default is dividing work based on file sizes ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: skipcrccheck :: Whether to skip CRC checks between source and target paths. ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: overwrite :: Choose to overwrite target files unconditionally, even if they exist. ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: append :: Reuse existing data in target files and append new data to them if possible ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: diff [ARG...] :: Use snapshot diff report to identify the difference between source and target ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: async :: Should distcp execution be blocking ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: filelimit [ARG] :: (Deprecated!) Limit number of files copied to <= n ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: sizelimit [ARG] :: (Deprecated!) Limit number of files copied to <= n bytes ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: bandwidth [ARG] :: Specify bandwidth per map in MB ]
18/10/29 10:49:47 DEBUG tools.OptionsParser: Adding option [ option: filters [ARG] :: The path to a file containing a list of strings for paths to be excluded from the copy. ]
18/10/29 10:49:47 DEBUG security.SecurityUtil: Setting hadoop.security.token.service.use_ip to true
18/10/29 10:49:47 DEBUG security.Groups: Creating new Groups object
18/10/29 10:49:47 DEBUG util.NativeCodeLoader: Trying to load the custom-built native-hadoop library...
18/10/29 10:49:47 DEBUG util.NativeCodeLoader: Loaded the native-hadoop library
18/10/29 10:49:47 DEBUG security.JniBasedUnixGroupsMapping: Using JniBasedUnixGroupsMapping for Group resolution
18/10/29 10:49:47 DEBUG security.JniBasedUnixGroupsMappingWithFallback: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMapping
18/10/29 10:49:48 DEBUG security.Groups: Group mapping impl=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback; cacheTimeout=300000; warningDeltaMs=5000
18/10/29 10:49:48 DEBUG security.UserGroupInformation: hadoop login
18/10/29 10:49:48 DEBUG security.UserGroupInformation: hadoop login commit
18/10/29 10:49:48 DEBUG security.UserGroupInformation: using local user:UnixPrincipal: svc_hdfs
18/10/29 10:49:48 DEBUG security.UserGroupInformation: Using user: "UnixPrincipal: svc_hdfs" with name svc_hdfs
18/10/29 10:49:48 DEBUG security.UserGroupInformation: User entry: "svc_hdfs"
18/10/29 10:49:48 DEBUG security.UserGroupInformation: Assuming keytab is managed externally since logged in from subject.
18/10/29 10:49:48 DEBUG security.UserGroupInformation: UGI loginUser:svc_hdfs (auth:SIMPLE)
18/10/29 10:49:48 DEBUG gcs.GoogleHadoopFileSystemBase: GHFS version: 1.8.1.2.6.5.0-292
18/10/29 10:49:48 DEBUG configuration.ConfigurationUtils: ConfigurationUtils.locate(): base is null, name is hadoop-metrics2-azure-file-system.properties
18/10/29 10:49:48 DEBUG configuration.ConfigurationUtils: ConfigurationUtils.locate(): base is null, name is hadoop-metrics2.properties
18/10/29 10:49:48 DEBUG configuration.ConfigurationUtils: Loading configuration from the context classpath (hadoop-metrics2.properties)
18/10/29 10:49:48 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
18/10/29 10:49:48 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).
18/10/29 10:49:48 INFO impl.MetricsSystemImpl: azure-file-system metrics system started
18/10/29 10:49:48 DEBUG azure.AzureNativeFileSystemStore: AzureNativeFileSystemStore init. Settings=8,false,90,{3000,3000,30000,30},{true,1.0,1.0}
18/10/29 10:49:48 DEBUG azure.AzureNativeFileSystemStore: Page blob directories:
18/10/29 10:49:48 DEBUG azure.AzureNativeFileSystemStore: Block blobs with compaction directories:
18/10/29 10:49:48 DEBUG azure.AzureNativeFileSystemStore: Atomic rename directories: /hbase
18/10/29 10:49:48 DEBUG azure.NativeAzureFileSystem: NativeAzureFileSystem. Initializing.
18/10/29 10:49:48 DEBUG azure.NativeAzureFileSystem: blockSize = 536870912
18/10/29 10:49:48 DEBUG azure.NativeAzureFileSystem: Getting the file status for wasbs:// /user
18/10/29 10:49:48 DEBUG azure.AzureNativeFileSystemStore: Retrieving metadata for user
18/10/29 10:49:48 DEBUG azure.SelfThrottlingIntercept: SelfThrottlingIntercept:: SendingRequest: threadId=1, requestType=read , isFirstRequest=true, sleepDuration=0
Can anyone please help on this .Let me know if any additional configuration is required .
Regards,
Vishal
Created 11-29-2019 03:06 PM
Hi,
One reason may be that the clusters on which the mapReduce is running does not manage resolving the Azure storage DNS.
You need to update the host file (whatever on Windows or Linux) to add:
IP_Of_The_Storage DNS_Name_Of_The_Storage
It has probably be done on the machine you're runnig the command, but all the machine composing the cluter should be able to access and resolve the DNS.
Created 11-30-2019 02:02 AM
Can you share how you did your ADLS setup? Please have a look at this Hadoop Azure Support Azure Blob Storage look at the jars and credentials to create in a secure environment