Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Have various Spark version running on the cluster

Solved Go to solution

Have various Spark version running on the cluster

New Contributor

Hi,

 

Is there a way to have various Spark version running on the cluster, and specifying which version to use at job startup ?

 

Thanks.

 

Cheers,

Yann

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Have various Spark version running on the cluster

Super Collaborator

I would use the spark action as much OOTB as possible leverage sharelib for since handles a number of things for you.

You can use multiple versions of sharelib as described here check for overriding the sharelib.

 

Wilfred

 

4 REPLIES 4

Re: Have various Spark version running on the cluster

Master Guru
Sure - Spark is a pure YARN app for the most part, with little to none server-side components. As long as you submit your application with the right Spark tarball/binary, the specified Spark version will be in use for running that very application. The use of multiple Spark History Servers, if needed, can also be done in form of separated configs and ports.

Note that CDH-wise, we ship only one Spark version, bound to its CDH version by build. Formal support of other varied versions outside of the CDH provided one is not covered (if you have a subscription).

Re: Have various Spark version running on the cluster

Super Collaborator

 

The second version of Spark must be compiled against the CDH artifacts. You can not pull down a generic build from a repository and expect that it works (we know it has issues). You would thus need to compile your own version of Spark and use the correct version of CDH to do it against. Using Spark from a later or earlier CDH release will not work, most likely due to changes in dependant libraries (i.e. hadoop or hive version).

 

For the shuffle service and the history service: they both are backwards compatible and only one of each is needed (running two is difficult and not needed). However you must run/configure only the one that comes with the latest version of Spark in your cluster.

 

There is no formal support for this and client configs will need manual work...

 

Wilfred

Re: Have various Spark version running on the cluster

New Contributor

Hi guys,

 

Thanks for anwsers. Wiuld you recommend using Oozie and refer spark jars like in this post or use SharedLibs (which would resukt in a mess if I have like 3 Spark versions in this folder) ?

 

Cheers,

Yann

Highlighted

Re: Have various Spark version running on the cluster

Super Collaborator

I would use the spark action as much OOTB as possible leverage sharelib for since handles a number of things for you.

You can use multiple versions of sharelib as described here check for overriding the sharelib.

 

Wilfred

 

Don't have an account?
Coming from Hortonworks? Activate your account here