Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

spark hdfs uri missing host on hdp 2.4

avatar
Contributor

I setup a small hdp 2.4 cluster with cloudbreak. After startup the spark history server fails. In the log I find:

Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///spark-history

I found these entries in the spark-defaults.conf file:

  • spark.history.fs.logDirectory hdfs:///spark-history
  • spark.eventLog.dir hdfs:///spark-history.

I'm using WASB storage for the cluster. I've no idea what host to set these URI's to.

1 ACCEPTED SOLUTION

avatar
Super Collaborator

Hi,

Are you using wasb as the default filesystem? If so then you can check the hdfs-site for the fs.defaultFS. I assume it should look like something like this: wasb://<storage_account>/spark-history

View solution in original post

5 REPLIES 5

avatar
Super Collaborator

Hi,

Are you using wasb as the default filesystem? If so then you can check the hdfs-site for the fs.defaultFS. I assume it should look like something like this: wasb://<storage_account>/spark-history

avatar
Contributor

I changed the directory to one based on the wasb defaultFS. But the spark throws a class not find error:

java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails

You'd expect this class to be available on a system supporting WASB...

Could I add some java options somewhere so it can find this class?

Also strange is that when I look at the setup of a HDInsight cluster, it's setup just like the default setup by Cloudbreak and there it simply works, the spark history server doesn't complain about the default hdfs:///spark-history uri

avatar
Super Collaborator

Hi,

I can reproduce your issue on HDP 2.5 as well. It seems like the spark assembly contains invalid azure storage jars. To fix this quickly I did the following:

mkdir -p /tmp/jarupdate && cd /tmp/jarupdate
find /usr/hdp/ -name "azure-storage*.jar"
cp /usr/hdp/2.5.0.1-210/hadoop/lib/azure-storage-2.2.0.jar .
cp /usr/hdp/current/spark-historyserver/lib/spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar .
unzip azure-storage-2.2.0.jar
jar uf spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar com/
mv -f spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar /usr/hdp/current/spark-historyserver/lib/spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar
cd .. && rm -rf /tmp/jarupdate

Basically I put the desired class files into the assembly jar and updated the original jar file. Once it's done just start the history server and it should be ok.

I changed in the spark defaults 2 configuration:

spark.eventLog.dir = wasb://cloudbreak492@kriszwasbnorth.blob.core.windows.net/spark-history
spark.history.fs.logDirectory = wasb://cloudbreak492@kriszwasbnorth.blob.core.windows.net/spark-history

Once the history server started I was able to start the spark teragen job:

spark-submit --class com.nexr.spark.terasort.TeraGen --deploy-mode cluster --master yarn-cluster --num-executors 1 spark-terasort-0.1.jar 1G wasb://cloudbreak492@kriszwasbnorth.blob.core.windows.net/teradata

avatar
Contributor

Thanks, that did the trick. I guess this is a bug. Where should I report it? A quick google search didn't give me anything useful on 'hortonworks bug report'

avatar
Super Collaborator

I think this is an Ambari bug. You can report the issue here after you've registered: http://issues.apache.org/jira/browse/AMBARI . Also could you please accept the answer here so others can find the solution while it's beeing fixed. Thank you.

Br,

Krisz