Reply
Highlighted
Expert Contributor
Posts: 277
Registered: ‎01-25-2017
Accepted Solution

Having Spark 1.6.0 and 2.1 in the same CDH

Hi Guys,

 

I'm planning to upgrade my CDH version to 5.10.2 and some of our Developers needs Spark 2.1 to use it in spark streaming.

 

I'm planning to manage the 2 versions using Cloudera manager, 1.6 will be intergrated one and the Spark 2.1 with parcels.

 

My questions:

 

1- Should i use the spark2 as a service? will they let me have 2 spark services, the reqular one and the spark 2.1 one?

 

2- is it preferable to istall the roles and gateways for spark on the same servers of the reqular one? i assume the history and spark server can be different servers and using different port for the history server, how it will looks like when i add 2 gateways on the same DN?

 

3- Is it compleicated to be managed?

 

4- Is there away that 2 versions conflicted and affecting the current Spark jobs?

Cloudera Employee
Posts: 465
Registered: ‎08-11-2014

Re: Having Spark 1.6.0 and 2.1 in the same CDH

(Spark 2.2 at this point?)

The services you mention are really the history server services only. You can run both, for Spark 1 and Spark 2 parallel installations, yes. 

 

You probably want both gateway roles on all nodes that you intend to run both types of Spark jobs from, yeah, but that's up to you.

 

I don't think there's anything else to know about managing them. The only thing to manage is the history server and it's a simple creature that CM manages.

 

All the scripts are differently named (spark2-submit vs spark-submit) so there should be no conflict.

Expert Contributor
Posts: 277
Registered: ‎01-25-2017

Re: Having Spark 1.6.0 and 2.1 in the same CDH

@srowen Thanks for your quick response.

 

In the cloudera manager i will have 2 services? and each has it's own different configurations?

 

For the developers is it seamless just releasing the job with the needed dependancies?

 

I like to have GA-1 regarding the version.

Cloudera Employee
Posts: 465
Registered: ‎08-11-2014

Re: Having Spark 1.6.0 and 2.1 in the same CDH

Yes you have two services for the history servers.

Yes you need to build your app vs Spark 1 or Spark 2 and then run with the right version.

Posts: 642
Topics: 3
Kudos: 105
Solutions: 67
Registered: ‎08-16-2016

Re: Having Spark 1.6.0 and 2.1 in the same CDH

@Fawze Your other questions have been answers but I wanted to add this bit regarding:

"spark streaming."

Spark2 comes with Structure Streams which is the new version of Spark Streaming. Currently Cloudera doesn't support it due to view it as an experimental API. I haven't looked myself, but if it is, then you run the risk of building apps based on it that could break with each upgrade of Spark2. Just a word of caution.

I am still in the testing phase but so far no issues with running Spark1 and Spark2 on the same cluster. I have the Spark History servers on different hosts but that is more to spread the load. They run on different . ports and the configuration work out of the box. As mentioned they are separate service with separate configs. I currently have the gateway on the same host.
Cloudera Employee
Posts: 465
Registered: ‎08-11-2014

Re: Having Spark 1.6.0 and 2.1 in the same CDH

Unsupported != doesn't work. Spark Streaming is shipped as-is and you can use structured streaming. The distro wouldn't include breaking changes to public APIs even where not supported.

New Contributor
Posts: 2
Registered: ‎07-09-2017

Re: Having Spark 1.6.0 and 2.1 in the same CDH

In recently released Spark 2.2, the Structured Streaming APIs are now GA and is no longer labeled experimental 

https://spark.apache.org/releases/spark-release-2-2-0.html#structured-streaming

 

When will CDH add support for Spark 2.2?

Cloudera Employee
Posts: 465
Registered: ‎08-11-2014

Re: Having Spark 1.6.0 and 2.1 in the same CDH

CDH already supports Spark 2.2, right?

New Contributor
Posts: 2
Registered: ‎07-09-2017

Re: Having Spark 1.6.0 and 2.1 in the same CDH

You are right, the CDH parcel for Spark 2.2 was released 10 days ago.

https://www.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html

 

Thanks!

Expert Contributor
Posts: 277
Registered: ‎01-25-2017

Re: Having Spark 1.6.0 and 2.1 in the same CDH

Can you please help

 

[root@aopr-dhc001 bin]# spark2-shell
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:104)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
[root@aopr-dhc001 bin]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/

Announcements