Support Questions

Find answers, ask questions, and share your expertise

Have various Spark version running on the cluster

avatar
Explorer

Hi,

 

Is there a way to have various Spark version running on the cluster, and specifying which version to use at job startup ?

 

Thanks.

 

Cheers,

Yann

1 ACCEPTED SOLUTION

avatar
Super Collaborator

I would use the spark action as much OOTB as possible leverage sharelib for since handles a number of things for you.

You can use multiple versions of sharelib as described here check for overriding the sharelib.

 

Wilfred

 

View solution in original post

4 REPLIES 4

avatar
Mentor
Sure - Spark is a pure YARN app for the most part, with little to none server-side components. As long as you submit your application with the right Spark tarball/binary, the specified Spark version will be in use for running that very application. The use of multiple Spark History Servers, if needed, can also be done in form of separated configs and ports.

Note that CDH-wise, we ship only one Spark version, bound to its CDH version by build. Formal support of other varied versions outside of the CDH provided one is not covered (if you have a subscription).

avatar
Super Collaborator

 

The second version of Spark must be compiled against the CDH artifacts. You can not pull down a generic build from a repository and expect that it works (we know it has issues). You would thus need to compile your own version of Spark and use the correct version of CDH to do it against. Using Spark from a later or earlier CDH release will not work, most likely due to changes in dependant libraries (i.e. hadoop or hive version).

 

For the shuffle service and the history service: they both are backwards compatible and only one of each is needed (running two is difficult and not needed). However you must run/configure only the one that comes with the latest version of Spark in your cluster.

 

There is no formal support for this and client configs will need manual work...

 

Wilfred

avatar
Explorer

Hi guys,

 

Thanks for anwsers. Wiuld you recommend using Oozie and refer spark jars like in this post or use SharedLibs (which would resukt in a mess if I have like 3 Spark versions in this folder) ?

 

Cheers,

Yann

avatar
Super Collaborator

I would use the spark action as much OOTB as possible leverage sharelib for since handles a number of things for you.

You can use multiple versions of sharelib as described here check for overriding the sharelib.

 

Wilfred