Reply
New Contributor
Posts: 4
Registered: ‎03-13-2018

Copy data from cloudera hdfs to azure blob storage

In cdh 5.10.2, we need copy data from hdfs to azure but we have problems to put files.

  • After config the azure account and test the access from azure storage explorer.
  • we config the core-site.xml with the credentials (Account + key) and restart.
  • we test the command distcp but the follow error appears:

    hadoop distcp /user/myuser/file1.txt wasb://cont1@testblobsAccount1.blob.core.windows.net/folder1/ -log /usr/myuser/

18/03/08 20:20:59 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/user/myuser/file1.txt, wasb://cont1@testblobsAccount1.blob.core.windows.net/folder1, -log], targetPath=/usr/myuser, targetPathExists=false, filtersFile='null'} 18/03/08 20:20:59 INFO client.RMProxy: Connecting to ResourceManager at xxxx.xxxx.test/1.1.1.1:8032 18/03/08 20:20:59 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties 18/03/08 20:20:59 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 18/03/08 20:20:59 INFO impl.MetricsSystemImpl: azure-file-system metrics system started 18/03/08 20:21:03 ERROR tools.DistCp: Exception encountered org.apache.hadoop.fs.azure.AzureException: com.microsoft.windowsazure.storage.StorageException: The value for one of the HTTP headers is not in the correct format. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:1907) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1587) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64) at org.apache.hadoop.fs.Globber.doGlob(Globber.java:272) at org.apache.hadoop.fs.Globber.glob(Globber.java:151) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1703) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:377) at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:90) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:179) at org.apache.hadoop.tools.DistCp.run(DistCp.java:141) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:441) Caused by: com.microsoft.windowsazure.storage.StorageException: The value for one of the HTTP headers is not in the correct format. at com.microsoft.windowsazure.storage.StorageException.translateFromHttpStatus(StorageException.java:212) at com.microsoft.windowsazure.storage.StorageException.translateException(StorageException.java:173) at com.microsoft.windowsazure.storage.core.StorageRequest.materializeException(StorageRequest.java:306) at com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:229) at com.microsoft.windowsazure.storage.blob.CloudBlobContainer.downloadAttributes(CloudBlobContainer.java:516) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobContainerWrapperImpl.downloadAttributes(StorageInterfaceImpl.java:233) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.checkContainer(AzureNativeFileSystemStore.java:1091) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:1823)

Announcements