Created 03-21-2017 09:44 AM
I setup a small hdp 2.4 cluster with cloudbreak. After startup the spark history server fails. In the log I find:
Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///spark-history
I found these entries in the spark-defaults.conf file:
I'm using WASB storage for the cluster. I've no idea what host to set these URI's to.
Created 03-21-2017 09:53 AM
Hi,
Are you using wasb as the default filesystem? If so then you can check the hdfs-site for the fs.defaultFS. I assume it should look like something like this: wasb://<storage_account>/spark-history
Created 03-21-2017 09:53 AM
Hi,
Are you using wasb as the default filesystem? If so then you can check the hdfs-site for the fs.defaultFS. I assume it should look like something like this: wasb://<storage_account>/spark-history
Created 03-23-2017 12:35 PM
I changed the directory to one based on the wasb defaultFS. But the spark throws a class not find error:
java.lang.ClassNotFoundException: com.microsoft.azure.storage.blob.BlobListingDetails
You'd expect this class to be available on a system supporting WASB...
Could I add some java options somewhere so it can find this class?
Also strange is that when I look at the setup of a HDInsight cluster, it's setup just like the default setup by Cloudbreak and there it simply works, the spark history server doesn't complain about the default hdfs:///spark-history uri
Created 03-23-2017 02:55 PM
Hi,
I can reproduce your issue on HDP 2.5 as well. It seems like the spark assembly contains invalid azure storage jars. To fix this quickly I did the following:
mkdir -p /tmp/jarupdate && cd /tmp/jarupdate find /usr/hdp/ -name "azure-storage*.jar" cp /usr/hdp/2.5.0.1-210/hadoop/lib/azure-storage-2.2.0.jar . cp /usr/hdp/current/spark-historyserver/lib/spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar . unzip azure-storage-2.2.0.jar jar uf spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar com/ mv -f spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar /usr/hdp/current/spark-historyserver/lib/spark-assembly-1.6.3.2.5.0.1-210-hadoop2.7.3.2.5.0.1-210.jar cd .. && rm -rf /tmp/jarupdate
Basically I put the desired class files into the assembly jar and updated the original jar file. Once it's done just start the history server and it should be ok.
I changed in the spark defaults 2 configuration:
spark.eventLog.dir = wasb://cloudbreak492@kriszwasbnorth.blob.core.windows.net/spark-history spark.history.fs.logDirectory = wasb://cloudbreak492@kriszwasbnorth.blob.core.windows.net/spark-history
Once the history server started I was able to start the spark teragen job:
spark-submit --class com.nexr.spark.terasort.TeraGen --deploy-mode cluster --master yarn-cluster --num-executors 1 spark-terasort-0.1.jar 1G wasb://cloudbreak492@kriszwasbnorth.blob.core.windows.net/teradata
Created 03-23-2017 07:01 PM
Thanks, that did the trick. I guess this is a bug. Where should I report it? A quick google search didn't give me anything useful on 'hortonworks bug report'
Created 03-23-2017 10:08 PM
I think this is an Ambari bug. You can report the issue here after you've registered: http://issues.apache.org/jira/browse/AMBARI . Also could you please accept the answer here so others can find the solution while it's beeing fixed. Thank you.
Br,
Krisz