Reply
Explorer
Posts: 6
Registered: ‎03-13-2018

Copy data from cloudera hdfs to azure blob storage

In cdh 5.10.2, we need copy data from hdfs to azure but we have problems to put files.

  • After config the azure account and test the access from azure storage explorer.
  • we config the core-site.xml with the credentials (Account + key) and restart.
  • we test the command distcp but the follow error appears:

    hadoop distcp /user/myuser/file1.txt wasb://cont1@testblobsAccount1.blob.core.windows.net/folder1/ -log /usr/myuser/

18/03/08 20:20:59 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, overwrite=false, append=false, useDiff=false, useRdiff=false, fromSnapshot=null, toSnapshot=null, skipCRC=false, blocking=true, numListstatusThreads=0, maxMaps=20, mapBandwidth=100, sslConfigurationFile='null', copyStrategy='uniformsize', preserveStatus=[], preserveRawXattrs=false, atomicWorkPath=null, logPath=null, sourceFileListing=null, sourcePaths=[/user/myuser/file1.txt, wasb://cont1@testblobsAccount1.blob.core.windows.net/folder1, -log], targetPath=/usr/myuser, targetPathExists=false, filtersFile='null'} 18/03/08 20:20:59 INFO client.RMProxy: Connecting to ResourceManager at xxxx.xxxx.test/1.1.1.1:8032 18/03/08 20:20:59 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties 18/03/08 20:20:59 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). 18/03/08 20:20:59 INFO impl.MetricsSystemImpl: azure-file-system metrics system started 18/03/08 20:21:03 ERROR tools.DistCp: Exception encountered org.apache.hadoop.fs.azure.AzureException: com.microsoft.windowsazure.storage.StorageException: The value for one of the HTTP headers is not in the correct format. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:1907) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:1587) at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:64) at org.apache.hadoop.fs.Globber.doGlob(Globber.java:272) at org.apache.hadoop.fs.Globber.glob(Globber.java:151) at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:1703) at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:77) at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86) at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:377) at org.apache.hadoop.tools.DistCp.prepareFileListing(DistCp.java:90) at org.apache.hadoop.tools.DistCp.execute(DistCp.java:179) at org.apache.hadoop.tools.DistCp.run(DistCp.java:141) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.tools.DistCp.main(DistCp.java:441) Caused by: com.microsoft.windowsazure.storage.StorageException: The value for one of the HTTP headers is not in the correct format. at com.microsoft.windowsazure.storage.StorageException.translateFromHttpStatus(StorageException.java:212) at com.microsoft.windowsazure.storage.StorageException.translateException(StorageException.java:173) at com.microsoft.windowsazure.storage.core.StorageRequest.materializeException(StorageRequest.java:306) at com.microsoft.windowsazure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:229) at com.microsoft.windowsazure.storage.blob.CloudBlobContainer.downloadAttributes(CloudBlobContainer.java:516) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$CloudBlobContainerWrapperImpl.downloadAttributes(StorageInterfaceImpl.java:233) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.checkContainer(AzureNativeFileSystemStore.java:1091) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:1823)

Explorer
Posts: 17
Registered: ‎12-07-2018

Re: Copy data from cloudera hdfs to azure blob storage

we config the core-site.xml with the credentials (Account + key) and restart
What do you mean by restart, Is it HDFS service restart across cluster
Explorer
Posts: 6
Registered: ‎03-13-2018

Re: Copy data from cloudera hdfs to azure blob storage

[ Edited ]

I restarted the service hdfs and then the cluster without results. Finally i used adl (Azure datalake) in place to wasb.

Posts: 899
Kudos: 28
Solutions: 12
Registered: ‎05-27-2014

Re: Copy data from cloudera hdfs to azure blob storage

Hi @cgomezfl,

 

In regarding to the original error message:

com.microsoft.windowsazure.storage.StorageException: The value for one of the HTTP headers is not in the correct format.

 

This error will occur if the Azure Account Kind is not set properly when creating the storage account. To correct this, set the Account Kind to General Purpose and the Access type to Blob within the Blob service configuration.

 

Hope this helps!

 

Thanks,

Li

Li Wang, Technical Resolution Manager


Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Terms of Service

Community Guidelines

How to use the forum