Created on 07-21-2017 01:16 PM - edited 09-16-2022 04:58 AM
Hi Guys,
I'm planning to upgrade my CDH version to 5.10.2 and some of our Developers needs Spark 2.1 to use it in spark streaming.
I'm planning to manage the 2 versions using Cloudera manager, 1.6 will be intergrated one and the Spark 2.1 with parcels.
My questions:
1- Should i use the spark2 as a service? will they let me have 2 spark services, the reqular one and the spark 2.1 one?
2- is it preferable to istall the roles and gateways for spark on the same servers of the reqular one? i assume the history and spark server can be different servers and using different port for the history server, how it will looks like when i add 2 gateways on the same DN?
3- Is it compleicated to be managed?
4- Is there away that 2 versions conflicted and affecting the current Spark jobs?
Created 07-26-2017 01:58 PM
Created 07-25-2017 09:57 AM
Created 07-25-2017 11:40 AM
Yes
Created 07-26-2017 06:05 AM
root@aopr-dhc001 bin]# spark2-shell
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:104)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 7 more
This error is almost always a result of not having Spark 2 gateway role installed on the host from where you've running spark2-shell (CM > Spark2 > Instances). I'd also double check that the client configuration is correctly deployed (CM > Cluster Name Drop Down menu> Deploy Client Configuration).
Created 07-26-2017 06:49 AM
Created 07-26-2017 07:52 AM
I assume you meant Spark2 gateway is deployed on all nodes (?)
Please share the output of the following command from aopr-dhc001 (gateway node)
alternatives --display spark2-conf
and a print-screen of the CM > Hosts > All Hosts > aopr-dhc001 > Expand Roles (showing spark2 gateway role)
Created 07-26-2017 11:08 AM
Created 07-26-2017 11:49 AM
The roles only show Spark gateway foraopr-dhc001 (and not Spark2 gateway) ! What you really need is a Spark2 Gateway (to invoke spark2-shell). Assuming you are using Cloudera Manager to manage your environment, please navigate to CM > Spark2 > Instances > Add Role Instances > Add "aopr-dhc001" in the gateway followed by deploying client configuration.
I have attached a few screenshots from my lab to clarify what I mean.
After installing the Spark2 gateway and deploying client configurations, your alternatives will automatically point to /etc/spark2/conf (required for running spark2-shell)
# alternatives --display spark2-conf
spark2-conf - status is auto.
link currently points to /etc/spark2/conf.cloudera.spark2_on_yarn
/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/etc/spark2/conf.dist - priority 10
/etc/spark2/conf.cloudera.spark2_on_yarn - priority 51
Current `best' version is /etc/spark2/conf.cloudera.spark2_on_yarn.
Hope this helps.
Created 07-26-2017 01:39 PM
something going wrong and i'm may missed it.
The service i added to the cluster is called spark (Standalone), i don't see it as spark2 on Yarn,
Also when i navigate the history server it show me the version is spark 1.6
Created 07-26-2017 01:52 PM
Created 07-26-2017 01:58 PM