Created on 07-21-2017 01:16 PM - edited 09-16-2022 04:58 AM
Hi Guys,
I'm planning to upgrade my CDH version to 5.10.2 and some of our Developers needs Spark 2.1 to use it in spark streaming.
I'm planning to manage the 2 versions using Cloudera manager, 1.6 will be intergrated one and the Spark 2.1 with parcels.
My questions:
1- Should i use the spark2 as a service? will they let me have 2 spark services, the reqular one and the spark 2.1 one?
2- is it preferable to istall the roles and gateways for spark on the same servers of the reqular one? i assume the history and spark server can be different servers and using different port for the history server, how it will looks like when i add 2 gateways on the same DN?
3- Is it compleicated to be managed?
4- Is there away that 2 versions conflicted and affecting the current Spark jobs?
Created 07-26-2017 01:58 PM
Created 07-21-2017 01:35 PM
(Spark 2.2 at this point?)
The services you mention are really the history server services only. You can run both, for Spark 1 and Spark 2 parallel installations, yes.
You probably want both gateway roles on all nodes that you intend to run both types of Spark jobs from, yeah, but that's up to you.
I don't think there's anything else to know about managing them. The only thing to manage is the history server and it's a simple creature that CM manages.
All the scripts are differently named (spark2-submit vs spark-submit) so there should be no conflict.
Created 07-21-2017 01:39 PM
@srowen Thanks for your quick response.
In the cloudera manager i will have 2 services? and each has it's own different configurations?
For the developers is it seamless just releasing the job with the needed dependancies?
I like to have GA-1 regarding the version.
Created 07-21-2017 01:49 PM
Yes you have two services for the history servers.
Yes you need to build your app vs Spark 1 or Spark 2 and then run with the right version.
Created 07-21-2017 11:17 PM
Created 07-22-2017 12:34 AM
Unsupported != doesn't work. Spark Streaming is shipped as-is and you can use structured streaming. The distro wouldn't include breaking changes to public APIs even where not supported.
Created 07-22-2017 08:23 PM
In recently released Spark 2.2, the Structured Streaming APIs are now GA and is no longer labeled experimental
https://spark.apache.org/releases/spark-release-2-2-0.html#structured-streaming
When will CDH add support for Spark 2.2?
Created 07-23-2017 01:53 AM
CDH already supports Spark 2.2, right?
Created 07-23-2017 12:49 PM
You are right, the CDH parcel for Spark 2.2 was released 10 days ago.
https://www.cloudera.com/documentation/spark2/latest/topics/spark2_packaging.html
Thanks!
Created 07-24-2017 10:54 PM
Can you please help
[root@aopr-dhc001 bin]# spark2-shell
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:104)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
[root@aopr-dhc001 bin]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0
/_/