Created on 02-19-2016 09:02 AM - edited 09-16-2022 03:04 AM
Hi,
Is there a way to have various Spark version running on the cluster, and specifying which version to use at job startup ?
Thanks.
Cheers,
Yann
Created 02-29-2016 01:14 AM
I would use the spark action as much OOTB as possible leverage sharelib for since handles a number of things for you.
You can use multiple versions of sharelib as described here check for overriding the sharelib.
Wilfred
Created 02-25-2016 03:00 PM
Created 02-25-2016 07:19 PM
The second version of Spark must be compiled against the CDH artifacts. You can not pull down a generic build from a repository and expect that it works (we know it has issues). You would thus need to compile your own version of Spark and use the correct version of CDH to do it against. Using Spark from a later or earlier CDH release will not work, most likely due to changes in dependant libraries (i.e. hadoop or hive version).
For the shuffle service and the history service: they both are backwards compatible and only one of each is needed (running two is difficult and not needed). However you must run/configure only the one that comes with the latest version of Spark in your cluster.
There is no formal support for this and client configs will need manual work...
Wilfred
Created 02-27-2016 03:30 AM
Hi guys,
Thanks for anwsers. Wiuld you recommend using Oozie and refer spark jars like in this post or use SharedLibs (which would resukt in a mess if I have like 3 Spark versions in this folder) ?
Cheers,
Yann
Created 02-29-2016 01:14 AM
I would use the spark action as much OOTB as possible leverage sharelib for since handles a number of things for you.
You can use multiple versions of sharelib as described here check for overriding the sharelib.
Wilfred