Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Having Spark 1.6.0 and 2.1 in the same CDH

avatar
Master Collaborator

Hi Guys,

 

I'm planning to upgrade my CDH version to 5.10.2 and some of our Developers needs Spark 2.1 to use it in spark streaming.

 

I'm planning to manage the 2 versions using Cloudera manager, 1.6 will be intergrated one and the Spark 2.1 with parcels.

 

My questions:

 

1- Should i use the spark2 as a service? will they let me have 2 spark services, the reqular one and the spark 2.1 one?

 

2- is it preferable to istall the roles and gateways for spark on the same servers of the reqular one? i assume the history and spark server can be different servers and using different port for the history server, how it will looks like when i add 2 gateways on the same DN?

 

3- Is it compleicated to be managed?

 

4- Is there away that 2 versions conflicted and affecting the current Spark jobs?

1 ACCEPTED SOLUTION

avatar
Champion
Did you restart CM and CMS?

If not, then it will not pickup the csd file and it will not be available as a service to install.

If you have, for the cluster with the parcels distributed and activated, choose 'Add a Service' from the cluster action menu. Is it available in that list of services?

View solution in original post

29 REPLIES 29

avatar
Master Collaborator

(Spark 2.2 at this point?)

The services you mention are really the history server services only. You can run both, for Spark 1 and Spark 2 parallel installations, yes. 

 

You probably want both gateway roles on all nodes that you intend to run both types of Spark jobs from, yeah, but that's up to you.

 

I don't think there's anything else to know about managing them. The only thing to manage is the history server and it's a simple creature that CM manages.

 

All the scripts are differently named (spark2-submit vs spark-submit) so there should be no conflict.

avatar
Master Collaborator

@srowen Thanks for your quick response.

 

In the cloudera manager i will have 2 services? and each has it's own different configurations?

 

For the developers is it seamless just releasing the job with the needed dependancies?

 

I like to have GA-1 regarding the version.

avatar
Master Collaborator

Yes you have two services for the history servers.

Yes you need to build your app vs Spark 1 or Spark 2 and then run with the right version.

avatar
Champion
@Fawze Your other questions have been answers but I wanted to add this bit regarding:

"spark streaming."

Spark2 comes with Structure Streams which is the new version of Spark Streaming. Currently Cloudera doesn't support it due to view it as an experimental API. I haven't looked myself, but if it is, then you run the risk of building apps based on it that could break with each upgrade of Spark2. Just a word of caution.

I am still in the testing phase but so far no issues with running Spark1 and Spark2 on the same cluster. I have the Spark History servers on different hosts but that is more to spread the load. They run on different . ports and the configuration work out of the box. As mentioned they are separate service with separate configs. I currently have the gateway on the same host.

avatar
Master Collaborator

Unsupported != doesn't work. Spark Streaming is shipped as-is and you can use structured streaming. The distro wouldn't include breaking changes to public APIs even where not supported.

avatar
New Contributor

In recently released Spark 2.2, the Structured Streaming APIs are now GA and is no longer labeled experimental 

https://spark.apache.org/releases/spark-release-2-2-0.html#structured-streaming

 

When will CDH add support for Spark 2.2?

avatar
Master Collaborator

CDH already supports Spark 2.2, right?

avatar
New Contributor

You are right, the CDH parcel for Spark 2.2 was released 10 days ago.

https://www.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html

 

Thanks!

avatar
Master Collaborator

Can you please help

 

[root@aopr-dhc001 bin]# spark2-shell
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:104)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
[root@aopr-dhc001 bin]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/