Support Questions

Find answers, ask questions, and share your expertise
Announcements
Welcome to the upgraded Community! Read this blog to see What’s New!

spark hdfs uri missing host on hdp 2.4

avatar

I setup a small hdp 2.4 cluster with cloudbreak. After startup the spark history server fails. In the log I find:

Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///spark-history

I found these entries in the spark-defaults.conf file:

  • spark.history.fs.logDirectory hdfs:///spark-history
  • spark.eventLog.dir hdfs:///spark-history.

I'm using WASB storage for the cluster. I've no idea what host to set these URI's to.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hi,

Are you using wasb as the default filesystem? If so then you can check the hdfs-site for the fs.defaultFS. I assume it should look like something like this: wasb://<storage_account>/spark-history

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

Hi,

Are you using wasb as the default filesystem? If so then you can check the hdfs-site for the fs.defaultFS. I assume it should look like something like this: wasb://<storage_account>/spark-history

avatar

I changed the directory to one based on the wasb defaultFS. But the spark throws a class not find error:

java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails

You'd expect this class to be available on a system supporting WASB...

Could I add some java options somewhere so it can find this class?

Also strange is that when I look at the setup of a HDInsight cluster, it's setup just like the default setup by Cloudbreak and there it simply works, the spark history server doesn't complain about the default hdfs:///spark-history uri

avatar
Expert Contributor

Hi,

I can reproduce your issue on HDP 2.5 as well. It seems like the spark assembly contains invalid azure storage jars. To fix this quickly I did the following:

mkdir -p /tmp/jarupdate && cd /tmp/jarupdate
find /usr/hdp/ -name "azure-storage*.jar"
cp /usr/hdp/2.5.0.1-210/hadoop/lib/azure-storage-2.2.0.jar .
cp /usr/hdp/current/spark-historyserver/lib/spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar .
unzip azure-storage-2.2.0.jar
jar uf spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar com/
mv -f spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar /usr/hdp/current/spark-historyserver/lib/spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar
cd .. && rm -rf /tmp/jarupdate

Basically I put the desired class files into the assembly jar and updated the original jar file. Once it's done just start the history server and it should be ok.

I changed in the spark defaults 2 configuration:

spark.eventLog.dir = wasb://cloudbreak492@kriszwasbnorth.blob.core.windows.net/spark-history
spark.history.fs.logDirectory = wasb://cloudbreak492@kriszwasbnorth.blob.core.windows.net/spark-history

Once the history server started I was able to start the spark teragen job:

spark-submit --class com.nexr.spark.terasort.TeraGen --deploy-mode cluster --master yarn-cluster --num-executors 1 spark-terasort-0.1.jar 1G wasb://cloudbreak492@kriszwasbnorth.blob.core.windows.net/teradata

avatar

Thanks, that did the trick. I guess this is a bug. Where should I report it? A quick google search didn't give me anything useful on 'hortonworks bug report'

avatar
Expert Contributor

I think this is an Ambari bug. You can report the issue here after you've registered: http://issues.apache.org/jira/browse/AMBARI . Also could you please accept the answer here so others can find the solution while it's beeing fixed. Thank you.

Br,

Krisz

Labels