Reply
Explorer
Posts: 8
Registered: ‎06-29-2017

Spark2 . Error deploying and status ok

[ Edited ]

I have installed Spark2 into Cloudera Manager, service is added in ok status although cloudera could not deploy well . Now I run 'spark2-shell' shows a error message:

 

[u_m1@cm bin]# ./spark2-shell
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments$$anonfun$mergeDefaultSparkProperties$1.apply(SparkSubmitArguments.scala:118)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.deploy.SparkSubmitArguments.mergeDefaultSparkProperties(SparkSubmitArguments.scala:118)
at org.apache.spark.deploy.SparkSubmitArguments.<init>(SparkSubmitArguments.scala:104)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:117)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

 

 

How can i deploy correctly the service?. Some recomendation?

Cloudera Employee
Posts: 31
Registered: ‎11-16-2015

Re: Spark2 . Error deploying and status ok

This error is almost always a result of not having Spark2 gateway role configured on the host from where you're trying to run spark2-shell (CM > Spark2 > Instances > Gateway). I'd ensure that the steps to add Spark2 service including CSD are correctly followed including a restart CM and CMS and would double check that the client configuration is correctly deployed (CM > Cluster Name Drop Down menu> Deploy Client Configuration).

 

If all is well, you should see the alternatives pointing to the /etc/spark2/conf...  (required for running spark2-shell)

[u_m1@cm bin]# alternatives --display spark2-conf
spark2-conf - status is auto.
link currently points to /etc/spark2/conf.cloudera.spark2_on_yarn
/opt/cloudera/parcels/SPARK2-2.2.0.cloudera1-1.cdh5.12.0.p0.142354/etc/spark2/conf.dist - priority 10
/etc/spark2/conf.cloudera.spark2_on_yarn - priority 51
Current `best' version is /etc/spark2/conf.cloudera.spark2_on_yarn.

 

Since you mentioned "service is added in ok status although cloudera could not deploy well "

Can you share with us what was the error? Maybe you might want to remove the service from CM, re-add it by ensuring that service is configured according to the document? Let us know.

 

Explorer
Posts: 8
Registered: ‎06-29-2017

Re: Spark2 . Error deploying and status ok

I added again spark2 service (with gateway in all nodes), all was ok ( I haven´t seen any error ), but command 'spark2-shell' shows the same error.

 

alternatives of spark2:

 

[root@node-r3 ~]# alternatives --display spark2-conf
spark2-conf - status is auto.
link currently points to /opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist
/opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist - priority 10
Current `best' version is /opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist.

Cloudera Employee
Posts: 31
Registered: ‎11-16-2015

Re: Spark2 . Error deploying and status ok

[ Edited ]
link currently points to /opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist

Hmm, that is our problem. If the spark2 gateway instance and the client configurations are rightly deployed then the link would automatically point to /etc/spark2/conf.cloudera.spark2_on_yarn. Could you possibly share a print screen of the CM UI > Home page (just displaying all the services) and a print-screen of CM UI > Spark2_on_yarn> Instances ? 

 

Does redeploying client-configurations from CM UI > Cluster name (drop down icon) > Deploy Client Configuration shows all okay or is there an error?

 

From host 'node-r3' can you also share $ ls -l /etc/spark2/ to see if there exists a directory /etc/spark2/conf.cloudera.spark2_on_yarn

 

BTW, I believe you can just run spark2-shell command instead of going into the spark2 parcel bin directory and launching ./spark2-shell

Explorer
Posts: 8
Registered: ‎06-29-2017

Re: Spark2 . Error deploying and status ok

 

I response your questions:

 

Could you possibly share a print screen of the CM UI > Home page:

 

Captura2.JPG

 

CM UI > Spark2_on_yarn> Instances

 

Captura3.JPG

 

 CM UI > Cluster name (drop down icon) > Deploy Client Configuration shows all okay or is there an error?

 It was ok

 

 

ls -ltr /etc/spark2/
lrwxrwxrwx 1 root root 29 nov 21 04:07 conf -> /etc/alternatives/spark2-conf

 

spark2-shell shows the same error, wherever I launch that command. :(

 

 

Cloudera Employee
Posts: 31
Registered: ‎11-16-2015

Re: Spark2 . Error deploying and status ok

Thank you. The gateway role instances seem fine. However, it's evident that the spark2 client configurations on the gateway node are not deployed (though I understand you don't see any errors while deploying it from CM). 

 

Could you please help double check the latest client configuration deployment logs on the host (eg node-r3). You'd find them  /var/run/cloudera-scm-agent/process/ccdeploy_spark2-conf* 

 

Example: 

$ cd /var/run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-3339029592499272274/logs

 

$ stderr.log

...
+ cp -a /run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-3339029592499272274/spark2-conf /etc/spark2/conf.cloudera.spark2_on_yarn
+ chown root /etc/spark2/conf.cloudera.spark2_on_yarn
+ chmod -R ugo+r /etc/spark2/conf.cloudera.spark2_on_yarn
+ '[' -e /etc/spark2/conf.cloudera.spark2_on_yarn/topology.py ']'
+ /usr/sbin/update-alternatives --install /etc/spark2/conf spark2-conf /etc/spark2/conf.cloudera.spark2_on_yarn 51
+ /usr/sbin/update-alternatives --auto spark2-conf



$ stdout.log

....
using /usr/sbin/update-alternatives as UPDATE_ALTERNATIVES
Deploying service client configs to /etc/spark2/conf.cloudera.spark2_on_yarn
invoking optional deploy script scripts/control.sh
/run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-3339029592499272274/spark2-conf /run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-3339029592499272274
Thu Nov 23 19:09:19 PST 2017: Running Spark2 CSD control script...
Thu Nov 23 19:09:19 PST 2017: Detected CDH_VERSION of [5]
Thu Nov 23 19:09:19 PST 2017: Deploying client configuration
deploy script exited with 0
/run/cloudera-scm-agent/process/ccdeploy_spark2-conf_etcspark2conf.cloudera.spark2_on_yarn_-3339029592499272274

 

Let us know if you see any errors or exceptions in these logs.

Explorer
Posts: 8
Registered: ‎06-29-2017

Re: Spark2 . Error deploying and status ok

[ Edited ]

I redeployed Spark2 ( 5 times by the moment ) and last directory created was /var/run/cloudera-scm-agent/process/ccdeploy_spark-conf_etcsparkconf.cloudera.spark_on_yarn2_-22632408179870636/logs

 

ls -ltr
total 2192
-rw-r----- 1 root root 1220 nov 24 08:20 stdout.log.bak
-rw-r----- 1 root root 1115772 nov 24 08:20 stderr.log.bak
-rw-r--r-- 1 root root 1220 nov 24 08:20 stdout.log
-rw-r--r-- 1 root root 1115772 nov 24 08:20 stderr.log

 

I have been looking for 'ERROR' or 'exception' ( with grep filter) but nothing I have found in that directory.

 

Can "Spark" stay with "Spark2" at same time? by the error 'java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream' I understand that library is not found in some directory.

 

Just now I have found something important, this directory is EMPTY :

 

     /opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist

 

It is the directory that command  "alternatives --display spark2-conf" shows as 'best version'. I guess that deploy is not working. I updated cloudera from 5.7 to 5.10 . Are Spark2 version cdh5.7.0.p0.118100 in conflict?

 

I will try with this parcel: http://archive.cloudera.com/spark2/parcels/2.2.0/ . EDIT: Installation is correct but fails (obviously because this parcel is for cdh 5.12)  'spark2-shell':

 

Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 52.0

 

 

 

 

Highlighted
Cloudera Employee
Posts: 31
Registered: ‎11-16-2015

Re: Spark2 . Error deploying and status ok

[ Edited ]
Can "Spark" stay with "Spark2" at same time? by the error 'java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream' I understand that library is not found in some directory.

Yes, spark(1.x) and spark2 can coexist. spark2 binaries are wrapped separately as spark2-shell, spark2-submit, pyspark2. Both the services are configured to not conflict and run on the same YARN cluster.

 

The error 'java.lang.NoClassDefFoundError: org/apache/hadoop/fs/FSDataInputStream' simply means that the client configuration couldn't be found on the host from where you are invoking spark2-shell 

 

 

 

Just now I have found something important, this directory is EMPTY :
 /opt/cloudera/parcels/SPARK2-2.0.0.cloudera2-1.cdh5.7.0.p0.118100/etc/spark2/conf.dist

Right, I double checked it on a working host. This is empty.

 

 

 

It is the directory that command  "alternatives --display spark2-conf" shows as 'best version'. I guess that deploy is not working.

Right. This gets changed to /etc/spark2/conf.cloudera.spark2* once the spark2 client configuration are correctly deployed. 

 

 

 

Installation is correct but fails (obviously because this parcel is for cdh 5.12)  'spark2-shell': 
Exception in thread "main" java.lang.UnsupportedClassVersionError: org/apache/spark/launcher/Main : Unsupported major.minor version 52.0

Well, spark2.2 does works on CDH 5.8 and above, however, this message just means that the system couldn't find java8 as default (sorry the message itself is not clear). Please see: https://www.cloudera.com/documentation/spark2/latest/topics/spark2_requirements.html

 

For a list of supported JDK8 version (and recommended) please see the documentation

And the way to install and configure JDK8 in your CDH cluster
 

Once done, you can do:
$ export JAVA_HOME=/usr/java/jdk1.8.0_121
$ spark2-shell

 

BTW, it would be worth if you could share full stdout.log and last few lines of stderr.log from the client configuration deployment directory /var/run/cloudera-scm-agent/process/ccdeploy_spark-conf_etcsparkconf.cloudera.spark_on_yarn2_-22632408179870636/logs

Announcements