Reply
Explorer
Posts: 8
Registered: ‎07-27-2016

Cloudera plans on Spark 2.0.0

[ Edited ]

Hello,

 

I dont see any updates with regards to Spark 2.0.0 on the product matrix. Since Spark 2.0.0 is here, I'm wondering when Cloudera plans to release support for Spark 2.0.0. As of 5.8, I can only see 1.6.0 , another question is whether Cloudera is planning to bump it to 2.0.0 instead of 1.6.1 and then to 2.0.0.

 

Could someone please address this.

 

Thanks,

RK

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Cloudera plans on Spark 2.0.0

No formal announcement but as you can imagine it can't be long before it is available.

A major release can't generally include breaking changes and Spark 2 makes breaking changes. The base Spark has to stay 1.6 but doesn't mean 2.0 can't also be available optionally.

CDH is already effectively on 1.6.2
Explorer
Posts: 8
Registered: ‎07-27-2016

Re: Cloudera plans on Spark 2.0.0

Thanks.

 

However,

http://www.cloudera.com/documentation/enterprise/release-notes/topics/cdh_rn_fixed_in_57.html#concep... has the following issue fixed:

  • SPARK-4452 - Shuffle data structures can starve others on the same thread for memory --> The fix is available in Spark 2.0.0

On the other hand,

  • SPARK-13622 - Issue creating level db for YARN shuffle service --> is fixed in both 1.6.2 / 2.0.0

 

Did Cloudera port the new patches without actual release of Spark 2.0.0? 

 

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Cloudera plans on Spark 2.0.0

Yes, the CDH maintenance patch set will always be potentially different for any project, including backporting fixes as appropriate even if whoever merged it upstream didn't backport it into a corresponding upstream branch. Likewise it's possible that an upstream project merges a change into a maintenance branch when it probably wasn't the right thing to do, and CDH would not do so.

Explorer
Posts: 19
Registered: ‎12-18-2014

Re: Cloudera plans on Spark 2.0.0

I'm on 

5.8.0-1.cdh5.8.0.p0.42

The spark version is 1.6.0.

 

$ spark-submit --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/

 

$ spspark-shell --version
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/


How do I upgrade to 1.6.2 like you said on earlier post?

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: Cloudera plans on Spark 2.0.0

The version will always be x.y.0 even though it contains patches on top of x.y.0. It would probably be a little nicer if this version were like "1.6.0-CDH-5.7.1" or whatever, because that's how the Maven artifacts are named, specificially to be clear that it's not the same as upstream x.y.0. The list of exact patches is always available in the release notes, but I'm saying that you will already have a version here with many changes on top of upstream 1.6.0, which ought to be like (but not necessarily identical to) the patches between 1.6.0 and 1.6.2.

Explorer
Posts: 19
Registered: ‎12-18-2014

Re: Cloudera plans on Spark 2.0.0

ok @srowenThank you for the explanation.

Explorer
Posts: 21
Registered: ‎10-22-2015

Re: Cloudera plans on Spark 2.0.0

Hi

 

You strictly don't need to wait for Cloudera to release Spark 2.0.0. Since Spark can be run as a YARN application it is possible to run a Spark version other than the one that comes bundled with the Cloudera distribution. This requires no administrator privileges and no changes to the cluster configuration and can be done by any user who has permission to run a YARN  job on the cluster. A YARN application ships over all it’s dependencies over to the cluster for each invocation. You can run multiple Spark versions simultaneously on a YARN cluster. Each version of Spark is self contained in in the user workspace on the Edge node. Running a new Spark version will not affect any other jobs running on your cluster.

 

Essentially you download and extract spark-2.0.0 on the edge node, copy over your existing cluster configuration and hive-site.xml to the configuration directory and run spark-shell from the new location. This should work out of the box. There are a few optional configuration tweaks, you will find detailed instructions on how to do this if you google for them

 

Though note that jobs running on Spark 2.0.0 may not be supported by Cloudera till the official release.

 

Regards

Deenar

 

 

Explorer
Posts: 8
Registered: ‎07-27-2016

Re: Cloudera plans on Spark 2.0.0

@DeenarT: Question is not whether you can point to a spark distro rather it's natively available or not. On a side note, Spark has released 2.0.0. Not sure if you meant Cloudera's version.

New Contributor
Posts: 1
Registered: ‎08-23-2016

Re: Cloudera plans on Spark 2.0.0

Hello,

 

I hear CDH 5.9 will come with Spark 2.0.0

 

What's the ETA for CDH 5.9?

 

Thanks