Created on 11-09-2017 12:21 AM - edited 09-16-2022 05:30 AM
I have installed Spark2 into Cloudera Manager, service is added in ok status although cloudera could not deploy well . Now I run 'spark2-shell' shows a error message:
[u_m1@cm bin]# ./spark2-shell
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:104)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:117)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
How can i deploy correctly the service?. Some recomendation?
Created 11-10-2017 06:40 PM
This error is almost always a result of not having Spark2 gateway role configured on the host from where you're trying to run spark2-shell (CM > Spark2 > Instances > Gateway). I'd ensure that the steps to add Spark2 service including CSD are correctly followed including a restart CM and CMS and would double check that the client configuration is correctly deployed (CM > Cluster Name Drop Down menu> Deploy Client Configuration).
If all is well, you should see the alternatives pointing to the /etc/spark2/conf... (required for running spark2-shell)
[u_m1@cm bin]# alternatives --display spark2-conf
spark2-conf - status is auto.
link currently points to /etc/spark2/conf.cloudera.spark2_on_yarn
/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/etc/spark2/conf.dist - priority 10
/etc/spark2/conf.cloudera.spark2_on_yarn - priority 51
Current `best' version is /etc/spark2/conf.cloudera.spark2_on_yarn.
Since you mentioned "service is added in ok status although cloudera could not deploy well "
Can you share with us what was the error? Maybe you might want to remove the service from CM, re-add it by ensuring that service is configured according to the document? Let us know.
Created 11-14-2017 05:31 AM
I added again spark2 service (with gateway in all nodes), all was ok ( I haven´t seen any error ), but command 'spark2-shell' shows the same error.
alternatives of spark2:
[root@node-r3 ~]# alternatives --display spark2-conf
spark2-conf - status is auto.
link currently points to /opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist
/opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist - priority 10
Current `best' version is /opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist.
Created on 11-14-2017 06:30 AM - edited 11-14-2017 06:31 AM
link currently points to /opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist
Hmm, that is our problem. If the spark2 gateway instance and the client configurations are rightly deployed then the link would automatically point to /etc/spark2/conf.cloudera.spark2_on_yarn. Could you possibly share a print screen of the CM UI > Home page (just displaying all the services) and a print-screen of CM UI > Spark2_on_yarn> Instances ?
Does redeploying client-configurations from CM UI > Cluster name (drop down icon) > Deploy Client Configuration shows all okay or is there an error?
From host 'node-r3' can you also share $ ls -l /etc/spark2/ to see if there exists a directory /etc/spark2/conf.cloudera.spark2_on_yarn
BTW, I believe you can just run spark2-shell command instead of going into the spark2 parcel bin directory and launching ./spark2-shell
Created 11-21-2017 12:08 AM
I response your questions:
Could you possibly share a print screen of the CM UI > Home page:
CM UI > Spark2_on_yarn> Instances
CM UI > Cluster name (drop down icon) > Deploy Client Configuration shows all okay or is there an error?
It was ok
ls -ltr /etc/spark2/
lrwxrwxrwx 1 root root 29 nov 21 04:07 conf -> /etc/alternatives/spark2-conf
spark2-shell shows the same error, wherever I launch that command. 😞
Created 11-23-2017 07:52 PM
Thank you. The gateway role instances seem fine. However, it's evident that the spark2 client configurations on the gateway node are not deployed (though I understand you don't see any errors while deploying it from CM).
Could you please help double check the latest client configuration deployment logs on the host (eg node-r3). You'd find them /var/run/cloudera-scm-agent/process/ccdeploy_spark2-conf*
Example:
$ cd /var/run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-3339029592499272274/logs
$ stderr.log
... + cp -a /run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-3339029592499272274/spark2-conf /etc/spark2/conf.cloudera.spark2_on_yarn + chown root /etc/spark2/conf.cloudera.spark2_on_yarn + chmod -R ugo+r /etc/spark2/conf.cloudera.spark2_on_yarn + '[' -e /etc/spark2/conf.cloudera.spark2_on_yarn/topology.py ']' + /usr/sbin/update-alternatives --install /etc/spark2/conf spark2-conf /etc/spark2/conf.cloudera.spark2_on_yarn 51 + /usr/sbin/update-alternatives --auto spark2-conf
$ stdout.log
.... using /usr/sbin/update-alternatives as UPDATE_ALTERNATIVES Deploying service client configs to /etc/spark2/conf.cloudera.spark2_on_yarn invoking optional deploy script scripts/control.sh /run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-3339029592499272274/spark2-conf /run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-3339029592499272274 Thu Nov 23 19:09:19 PST 2017: Running Spark2 CSD control script... Thu Nov 23 19:09:19 PST 2017: Detected CDH_VERSION of [5] Thu Nov 23 19:09:19 PST 2017: Deploying client configuration deploy script exited with 0 /run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-3339029592499272274
Let us know if you see any errors or exceptions in these logs.
Created on 11-23-2017 11:43 PM - edited 11-24-2017 12:18 AM
I redeployed Spark2 ( 5 times by the moment ) and last directory created was /var/run/cloudera-scm-agent/process/ccdeploy_spark-conf_etcsparkconf.cloudera.spark_on_yarn2_-22632408179870636/logs
ls -ltr
total 2192
-rw-r----- 1 root root 1220 nov 24 08:20 stdout.log.bak
-rw-r----- 1 root root 1115772 nov 24 08:20 stderr.log.bak
-rw-r--r-- 1 root root 1220 nov 24 08:20 stdout.log
-rw-r--r-- 1 root root 1115772 nov 24 08:20 stderr.log
I have been looking for 'ERROR' or 'exception' ( with grep filter) but nothing I have found in that directory.
Can "Spark" stay with "Spark2" at same time? by the error 'java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream' I understand that library is not found in some directory.
Just now I have found something important, this directory is EMPTY :
/opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist
It is the directory that command "alternatives --display spark2-conf" shows as 'best version'. I guess that deploy is not working. I updated cloudera from 5.7 to 5.10 . Are Spark2 version cdh5.7.0.p0.118100 in conflict?
I will try with this parcel: http://archive.cloudera.com/spark2/parcels/2.2.0/ . EDIT: Installation is correct but fails (obviously because this parcel is for cdh 5.12) 'spark2-shell':
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 52.0
Created on 11-24-2017 03:44 AM - edited 11-24-2017 03:51 AM
Can "Spark" stay with "Spark2" at same time? by the error 'java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream' I understand that library is not found in some directory.
Yes, spark(1.x) and spark2 can coexist. spark2 binaries are wrapped separately as spark2-shell, spark2-submit, pyspark2. Both the services are configured to not conflict and run on the same YARN cluster.
The error 'java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream' simply means that the client configuration couldn't be found on the host from where you are invoking spark2-shell
Just now I have found something important, this directory is EMPTY : /opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist
Right, I double checked it on a working host. This is empty.
It is the directory that command "alternatives --display spark2-conf" shows as 'best version'. I guess that deploy is not working.
Right. This gets changed to /etc/spark2/conf.cloudera.spark2* once the spark2 client configuration are correctly deployed.
Installation is correct but fails (obviously because this parcel is for cdh 5.12) 'spark2-shell': Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 52.0
Well, spark2.2 does works on CDH 5.8 and above, however, this message just means that the system couldn't find java8 as default (sorry the message itself is not clear). Please see: https://www.cloudera.com/documentation/spark2/latest/topics/spark2_requirements.html
For a list of supported JDK8 version (and recommended) please see the documentation
And the way to install and configure JDK8 in your CDH cluster
Once done, you can do:
$ export JAVA_HOME=/usr/java/jdk1.8.0_121
$ spark2-shell
BTW, it would be worth if you could share full stdout.log and last few lines of stderr.log from the client configuration deployment directory /var/run/cloudera-scm-agent/process/ccdeploy_spark-conf_etcsparkconf.cloudera.spark_on_yarn2_-22632408179870636/logs
Created 05-09-2018 06:58 AM