Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Need instructions to setup WASB as storage for HDP on Azure IaaS

avatar
Contributor

Some customers opt to set up HDP on Azure IaaS instead of HDInsight. They may still need to persist data in the Azure Blob Store. If you have setup HDP on Azure IaaS with WASB for storage please share your setup instructions and related best practices so all can benefit.

1 ACCEPTED SOLUTION

avatar

The main thing that needs to be done to enable access to WASB is to configure the credentials. This involves editing core-site.xml, and the instructions are documented in Apache here:

http://hadoop.apache.org/docs/r2.7.1/hadoop-azure/index.html

After configuring the credentials, you'll be able to access WASB files by specifying URLs that use "wasb" as the scheme. HDInsight clusters also make WASB the default file system by setting configuration property fs.defaultFS to a wasb URL. The same Apache documentation shows usage examples for both cases.

Note that setting fs.defaultFS is only done if your intention is to replace HDFS fully as the default file system. To my knowledge, this has only ever been done in HDInsight.

View solution in original post

10 REPLIES 10

avatar
New Contributor

Hello,

I have set up the my cluster with the above configurations and everything is working fine except spark. I am receiving a lot of errors while starting spark-shell, after changing the storage from HDFS to WASB. The errors look like the following:

java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState': at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$reflect(SparkSession.scala:983) at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110) at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109) at org.apache.spark.sql.SparkSession$Builder$anonfun$getOrCreate$5.apply(SparkSession.scala:878) at org.apache.spark.sql.SparkSession$Builder$anonfun$getOrCreate$5.apply(SparkSession.scala:878) at scala.collection.mutable.HashMap$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878) at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96) ... 47 elided Caused by: java.lang.reflect.InvocationTargetException: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$reflect(SparkSession.scala:980) ... 58 more Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog': at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$reflect(SharedState.scala:176) at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86) at org.apache.spark.sql.SparkSession$anonfun$sharedState$1.apply(SparkSession.scala:101) at org.apache.spark.sql.SparkSession$anonfun$sharedState$1.apply(SparkSession.scala:101) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101) at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100) at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157) at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32) ... 63 more Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$reflect(SharedState.scala:173) ... 71 more Caused by: java.lang.reflect.InvocationTargetException: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262) at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:65) ... 76 more Caused by: java.lang.RuntimeException: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details. at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522) at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188) ... 84 more Caused by: org.apache.hadoop.fs.azure.AzureException: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details. at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2027) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2081) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.conditionalRedoFolderRename(NativeAzureFileSystem.java:2137) at org.apache.hadoop.fs.azure.NativeAzureFileSystem.getFileStatus(NativeAzureFileSystem.java:2104) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1447) at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) ... 85 more Caused by: java.util.NoSuchElementException: An error occurred while enumerating the result, check the original exception for details. at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:113) at org.apache.hadoop.fs.azure.StorageInterfaceImpl$WrappingIterator.hasNext(StorageInterfaceImpl.java:130) at org.apache.hadoop.fs.azure.AzureNativeFileSystemStore.retrieveMetadata(AzureNativeFileSystemStore.java:2006) ... 93 more Caused by: com.microsoft.azure.storage.StorageException: The server encountered an unknown failure: OK at com.microsoft.azure.storage.StorageException.translateException(StorageException.java:101) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:199) at com.microsoft.azure.storage.core.LazySegmentedIterator.hasNext(LazySegmentedIterator.java:109) ... 95 more Caused by: java.lang.ClassCastException: org.apache.xerces.parsers.XIncludeAwareParserConfiguration cannot be cast to org.apache.xerces.xni.parser.XMLParserConfiguration at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source) at org.apache.xerces.parsers.SAXParser.<init>(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.<init>(Unknown Source) at org.apache.xerces.jaxp.SAXParserImpl.<init>(Unknown Source) at org.apache.xerces.jaxp.SAXParserFactoryImpl.newSAXParser(Unknown Source) at com.microsoft.azure.storage.core.Utility.getSAXParser(Utility.java:668) at com.microsoft.azure.storage.blob.BlobListHandler.getBlobList(BlobListHandler.java:72) at com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1284) at com.microsoft.azure.storage.blob.CloudBlobContainer$6.postProcessResponse(CloudBlobContainer.java:1248) at com.microsoft.azure.storage.core.ExecutionEngine.executeWithRetry(ExecutionEngine.java:146) ... 96 more <console>:14: error: not found: value spark import spark.implicits._ ^ <console>:14: error: not found: value spark import spark.sql

Please help in resolving the issue and let me know if it is possible at all, for spark to function in the WASB storage environment.

Regards,

Subhankar