Support Questions

Find answers, ask questions, and share your expertise

Access Azure Blob Storage from on premise Hadoop cluster

avatar
New Contributor

I am trying to access azure blob storage from on premise hadoop cluster

Following are the steps that we need to execute: (Reference URL: http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html )

  1. Update core-site.xml with following parameter 

 

<property>

  <name>fs.azure.account.key.<stoarge-name>.blob.core.windows.net</name>

  <value>Access key</value>

</property>

 

 

  1. Restart the cluster
  2. Try to access azure container using following command
  3. hadoop fs -mkdir wasb://test@storage-name.blob.core.windows.net/ testDir
  4. hdfs dfs -ls wasbs://test@storagename.blob.core.windows.net/

Here, we are stuck and getting following error

hadoop fs -mkdir wasb://test@storage.blob.core.windows.net/testDir

19/12/04 08:46:37 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-azure-file-system.properties,hadoop-metrics2.properties

19/12/04 08:46:37 INFO impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s).

19/12/04 08:46:37 INFO impl.MetricsSystemImpl: azure-file-system metrics system started

mkdir: com.microsoft.windowsazure.storage.StorageException: An unknown failure occurred : Connection timed out (Connection timed out)

19/12/04 09:03:25 INFO impl.MetricsSystemImpl: Stopping azure-file-system metrics system...

19/12/04 09:03:25 INFO impl.MetricsSystemImpl: azure-file-system metrics system stopped.

19/12/04 09:03:25 INFO impl.MetricsSystemImpl: azure-file-system metrics system shutdown complete.

 

I am able to hit REST apis after setting proxy. I even tried setting proxy for hadoop discp command.

as follow:

hadoop distcp -D mapreduce.map.java.opts="$DISTCP_PROXY_OPTS" -D mapreduce.reduce.java.opts="$DISTCP_PROXY_OPTS" -update -skipcrccheck -numListstatusThreads 40 hdfs://nameservice1/WMA/test-az.csv wasb://test@pallavistorage.blob.core.windows.net/

 

But still facing the same issue.

How to resolve this time out issue?

 

1 REPLY 1

avatar
Contributor

Hi @redwuie .

you can do this activity with Azure HDInsight Cluster as below LInk:-

 

http://dbmentors.blogspot.com/2018/02/integrating-hadoop-cluster-with.html(this is not on premise)

 

But as per my concern i tried a lot to do the same with on premise hadoop cluster but that not worked .

 

there is only single solution for that if you want to move data from Hadoop on premise cluster to Azure data lake or Blob then you have to use Azure Data Box.

 

 

Thanks

HadoopHelp