Created on 05-22-2015 09:14 AM - edited 09-16-2022 02:30 AM
The Spark Jar Location (HDFS) (spark_jar_hdfs_path) parameter is set to /user/spark/share/lib/spark-assembly.jar
However, the HDFS file /user/spark/share/lib/spark-assembly.jar is NOT there!
The only HDFS folder/file for Spark that exists is /user/spark/applicationHistory
Although I have run via CM to 'Upload Spark Jar' (from drop-down Actions option) successfully (at least that's what CM tells me) when I check the spark HDFS folders/files the jar (spark-assembly.jar) is not there!!!
Created 05-26-2015 06:11 PM
In CM & CDH 5.4 you should unset it and let it use the one that is there on the nodes. Much faster.
Wilfred
Created 05-22-2015 10:17 AM
I don't think that is used anymore in recent CDH; this is not how the assembly is distributed. What problem are you having?
Created 05-22-2015 10:53 AM
Interesting...
Somehow, the Spark Parameter spark_jar_hdfs_path is set to (HDFS) '/user/spark/share/lib/spark-assmbly.jar' value and CM complains about 'Failed parameter validation'!
Should I unset it??
Created 05-22-2015 12:16 PM
If it's set, it probably needs to be an hdfs: path, but I don't think this setting matters in recent CDH.
Created 05-22-2015 12:46 PM
Should I un-set it?
CM keeps complaining...
Created 05-22-2015 01:11 PM
Also, what Spark userid's HDFS folder structure should look like?
So far I am having only one HDFS folder:
/user/spark/applicationHistory
Created 05-25-2015 06:34 PM
In a recent version (CM/CDH 5.4 as an example) the directory should just look like what you have now. We do not push the assembly separately any more. It uses the assembly installed on the nodes, by default, that is faster than using the one from HDFS.The setting is still there to allow custom assemblies to be used.
The setting should be entered without the HDFS in front and the path will be pushed out with HDFS in front (CM will handle that for you). Which version of CDH and CM are you using?
Wilfred
Created 05-26-2015 12:28 PM
I have upgraded both CM & CDH to 5.4 release.
Created 05-26-2015 06:11 PM
In CM & CDH 5.4 you should unset it and let it use the one that is there on the nodes. Much faster.
Wilfred