Support Questions

Find answers, ask questions, and share your expertise

hdfs:/user/spark/share/lib/spark-assembly.jar is missing

avatar
Contributor

The Spark Jar Location (HDFS) (spark_jar_hdfs_path) parameter is set to /user/spark/share/lib/spark-assembly.jar

However, the HDFS file /user/spark/share/lib/spark-assembly.jar is NOT there!

The only HDFS folder/file for Spark that exists is /user/spark/applicationHistory

 

Although I have run via CM to 'Upload Spark Jar' (from drop-down Actions option) successfully (at least that's what CM tells me) when I check the spark HDFS folders/files the jar (spark-assembly.jar) is not there!!!

 

1 ACCEPTED SOLUTION

avatar
Super Collaborator

In CM & CDH 5.4 you should unset it and let it use the one that is there on the nodes. Much faster.

 

Wilfred

View solution in original post

8 REPLIES 8

avatar
Master Collaborator

I don't think that is used anymore in recent CDH; this is not how the assembly is distributed. What problem are you having?

avatar
Contributor

Interesting...

 

Somehow, the Spark Parameter spark_jar_hdfs_path is set to (HDFS) '/user/spark/share/lib/spark-assmbly.jar' value  and CM complains about 'Failed parameter validation'!

Should I unset it??

 

 

avatar
Master Collaborator

If it's set, it probably needs to be an hdfs: path, but I don't think this setting matters in recent CDH.

avatar
Contributor

Should I un-set it?

CM keeps complaining...

 

avatar
Contributor

Also, what Spark userid's HDFS folder structure should look like?

So far I am having only one HDFS folder:

/user/spark/applicationHistory

 

avatar
Super Collaborator

In a recent version (CM/CDH 5.4 as an example) the directory should just look like what you have now. We do not push the assembly separately any more. It uses the assembly installed on the nodes, by default, that is faster than using the one from HDFS.The setting is still there to allow custom assemblies to be used.

 

The setting should be entered without the HDFS in front and the path will be pushed out with HDFS in front (CM will handle that for you). Which version of CDH and CM are you using?

 

Wilfred

avatar
Contributor

I have upgraded both CM & CDH to 5.4 release.

 

avatar
Super Collaborator

In CM & CDH 5.4 you should unset it and let it use the one that is there on the nodes. Much faster.

 

Wilfred