07-21-2017 01:16 PM
I'm planning to upgrade my CDH version to 5.10.2 and some of our Developers needs Spark 2.1 to use it in spark streaming.
I'm planning to manage the 2 versions using Cloudera manager, 1.6 will be intergrated one and the Spark 2.1 with parcels.
1- Should i use the spark2 as a service? will they let me have 2 spark services, the reqular one and the spark 2.1 one?
2- is it preferable to istall the roles and gateways for spark on the same servers of the reqular one? i assume the history and spark server can be different servers and using different port for the history server, how it will looks like when i add 2 gateways on the same DN?
3- Is it compleicated to be managed?
4- Is there away that 2 versions conflicted and affecting the current Spark jobs?
07-21-2017 01:35 PM
(Spark 2.2 at this point?)
The services you mention are really the history server services only. You can run both, for Spark 1 and Spark 2 parallel installations, yes.
You probably want both gateway roles on all nodes that you intend to run both types of Spark jobs from, yeah, but that's up to you.
I don't think there's anything else to know about managing them. The only thing to manage is the history server and it's a simple creature that CM manages.
All the scripts are differently named (spark2-submit vs spark-submit) so there should be no conflict.
07-21-2017 01:39 PM
@srowen Thanks for your quick response.
In the cloudera manager i will have 2 services? and each has it's own different configurations?
For the developers is it seamless just releasing the job with the needed dependancies?
I like to have GA-1 regarding the version.
07-21-2017 11:17 PM
07-22-2017 12:34 AM
Unsupported != doesn't work. Spark Streaming is shipped as-is and you can use structured streaming. The distro wouldn't include breaking changes to public APIs even where not supported.
07-22-2017 08:23 PM
In recently released Spark 2.2, the Structured Streaming APIs are now GA and is no longer labeled experimental
When will CDH add support for Spark 2.2?
07-23-2017 12:49 PM
You are right, the CDH parcel for Spark 2.2 was released 10 days ago.
07-24-2017 10:54 PM
Can you please help
[root@aopr-dhc001 bin]# spark2-shell
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
... 7 more
[root@aopr-dhc001 bin]# spark-shell
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.6.0