Support Questions

YannBarraud · ‎02-19-2016

Hi,

Is there a way to have various Spark version running on the cluster, and specifying which version to use at job startup ?

Thanks.

Cheers,

Yann

Wilfred · ‎02-29-2016

I would use the spark action as much OOTB as possible leverage sharelib for since handles a number of things for you.

You can use multiple versions of sharelib as described here check for overriding the sharelib.

Wilfred

View solution in original post

Harsh J · ‎02-25-2016

Sure - Spark is a pure YARN app for the most part, with little to none server-side components. As long as you submit your application with the right Spark tarball/binary, the specified Spark version will be in use for running that very application. The use of multiple Spark History Servers, if needed, can also be done in form of separated configs and ports.

Note that CDH-wise, we ship only one Spark version, bound to its CDH version by build. Formal support of other varied versions outside of the CDH provided one is not covered (if you have a subscription).

Wilfred · ‎02-25-2016

The second version of Spark must be compiled against the CDH artifacts. You can not pull down a generic build from a repository and expect that it works (we know it has issues). You would thus need to compile your own version of Spark and use the correct version of CDH to do it against. Using Spark from a later or earlier CDH release will not work, most likely due to changes in dependant libraries (i.e. hadoop or hive version).

For the shuffle service and the history service: they both are backwards compatible and only one of each is needed (running two is difficult and not needed). However you must run/configure only the one that comes with the latest version of Spark in your cluster.

There is no formal support for this and client configs will need manual work...

Wilfred

YannBarraud · ‎02-27-2016

Hi guys,

Thanks for anwsers. Wiuld you recommend using Oozie and refer spark jars like in this post or use SharedLibs (which would resukt in a mess if I have like 3 Spark versions in this folder) ?

Cheers,

Yann

Wilfred · ‎02-29-2016

I would use the spark action as much OOTB as possible leverage sharelib for since handles a number of things for you.

You can use multiple versions of sharelib as described here check for overriding the sharelib.

Wilfred

Cloudera Community

Support Questions

Have various Spark version running on the cluster

Testing Spark write performance with Spark version...

Running Spark Application on a Kerberized Hadoop c...

Spark version is empty in CDH6.3.2

Can't run spark-submit to yarn cluster

Run multiple NiFi versions in the same clusters?

Setting up a Hadoop/Spark cluster with Docker on a...

Run spark2 action with Oozie on HDP clusters

Spark DataFrame to Solr Cloud - runs on Sandbox 2....

Running Spark in Oozie using yarn-cluster

Running Spark in Production?