Our organization uses CDH 5.4 with Spark 1.6. One of our biggests pain points is that cloudera platform does not support Scala 2.11.
I was under the impression that a newer release of CDH may address this issue but upon googling I find that
Even the latest release doesn't support Scala 2.11
I want to know why does a company like Cloudera can't support a language which has been out there for quite some time. and is there anytime soon, we can see Cloudera support Scala 2.11?
Our team has been working for several years on Scala projects and its not possible for us to move to Scala 2.10 so we really can't use your platform.
No, Spark 2.x supports Scala 2.10.
The problem is that Scala 2.10 and 2.11 are mutually incompatible, so you can't support both at once. And, updating the version would be a breaking change. It's not possible to make such a big breaking change in a minor release of the product.
CDH6 would be Scala 2.11 though.
Yes, The Spark 2 release in CDH will require Scala 2.11. The existing support of Spark 1.x will stay on Scala 2.10 because we provide long-term support for the platform and can't make a breaking change in the default distribution. I think this is what you're after, right? you want 2.11 support.
Thanks for the quick reply.
Let me try to understand correctly - What you are saying is the official Spark 2 release in the future, NOT the Spark 2 Beta currently available, right? If the Spark 2 Beta parcel requires Scala 2.11, then it can't be installed on any current CDH, because none supports Scala 2.11.
No, all of the current supported components on CDH 5.x use Scala 2.10. The Spark 2 beta right now uses Scala 2.11. It is unsupported right now. I suspect that when it is supported, there will still be some caveats about what it can interact with because of the Scala version mismatch. Although Spark 2.0.x upstream still supports Scala 2.10 at the moment, the Spark 2 distribution here will require 2.11.
So then, since Spark 2 Beta has a hard dependency on Scala 2.11, Cloudera should also provide some guidance for installing the official package onto CDH ourselves, correct? It seems to be a simple package-add, but we'd like some assurance that it won't break CDH.
Spark is basically an application. It can bundle its own Scala, so its existence doesn't really cause anything else trouble. It becomes a problem only when importing other client libraries that don't work with the same Scala verison. Or trying to embed it from another process that somehow uses another Scala version, but that's rare. Scala libraries are shipped in CDH, both 2.10 and 2.11.
Thanks. We have 5.7.1, which seems to only come with Scala 2.10 - do we need to upgrade?
$ lf $CLOUDERA_HOME/jars/*scala*