Reply
Explorer
Posts: 10
Registered: ‎02-11-2015
Accepted Solution

spark sql is not working on CDH5.3

We upgraded to CDH5.3. spark-sql is not working. Could any bodyk please provide us am I missing any steps we need to follow to spark-sql to work. I'm getting the below error.

 

Steps we did:

1.copy the hive-site.xml to /etc/spark/conf

2.Try to start the thrifiserver but getting the error.

 

/opt/cloudera/parcels/CDH/lib/spark/sbin/start-thriftserver.sh
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /var/log/spark/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-rkqcrl-odshaun01.out
failed to launch org.apache.spark.sql.hive.thriftserver.HiveThriftServer2:
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 18 more

 

bash-4.1$ ./spark-sql
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/cli/CliDriver
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:342)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.cli.CliDriver
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 18 more

 

Regards,

Venky

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: spark sql is not working on CDH5.3

So, Spark SQL is shipped unchanged from upstream. It should mostly work as-is, as a result. It is not formally supported, as it's still an alpha component. Here in particular, have a look at other threads on this forum. I think the issue is that Spark SQL is not yet compatible with the later version of Hive in CDH, so it's not built with Hive support. Some of it should still work, but you have to add the Hive JARs to the classpath at least.

Highlighted
Explorer
Posts: 10
Registered: ‎02-11-2015

Re: spark sql is not working on CDH5.3

Thanks for quick reply. I have added the path too.  But still not working. correct me if I'm wrong or missing something.

 

#!/bin/bash
export JAVA_HOME=/usr/java/jdk1.7.0_55
SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/

SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/spark/lib/*.jar

 

JARS=""
for j in `ls /opt/cloudera/parcels/CDH/lib/hadoop/client/*.jar`
do
JARS=$JARS:$j
JARS1=$j,$JARS1
done

CLI=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-cli-0.13.1-cdh5.3.1.jar:/opt/cloudera/parcels/CDH/lib/hive/lib/hive-common-0.13.1-cdh5.3.1.jar:=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-0.13.1-cdh5.3.1.jar:/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec-0.13.1-cdh5.3.1.jar
$SPARK_CLASSPATH:$JARS:$CLI

$SPARK_HOME/bin/spark-sql --master local

Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: spark sql is not working on CDH5.3

You're probably beyond my knowledge. But the immediate error is easy enough to understand; it can't find the Hive classes, so something is still wrong there. I see a typo in your path for example; there are two jars separated by ":=" Is it just that?

Explorer
Posts: 10
Registered: ‎02-11-2015

Re: spark sql is not working on CDH5.3

my bad. I fixed the typo. But still no luck. Thanks.

New Contributor
Posts: 2
Registered: ‎03-24-2015

Re: spark sql is not working on CDH5.3

i have the same issue. Even have hive classpath included, it still gives error.

Anyone has luck on this?
Explorer
Posts: 10
Registered: ‎02-11-2015

Re: spark sql is not working on CDH5.3

Hi,

 

There is bug in the classpath.

 

You need add a line int the compute-classpath.sh CLASSPATH="$CLASSPATH:/opt/cloudera/parcels/CDH/lib/hive/lib/*" .  Then it will work without any issues.

 

Regards,

Venkat

 

 

New Contributor
Posts: 2
Registered: ‎03-24-2015

Re: spark sql is not working on CDH5.3

thanks a lot! this does the trick:)
Cloudera Employee
Posts: 481
Registered: ‎08-11-2014

Re: spark sql is not working on CDH5.3

To add a little color, yes you can do that, although the CLASSPATH intentionally does not include Hive, since as I understand, Spark doesn't work with the later versions of Hive that CDH 5.3 and beyond use. It still may work enough to do what you need, so, have at it. But you may hit some incompatibilities.

Explorer
Posts: 10
Registered: ‎02-11-2015

Re: spark sql is not working on CDH5.3

[ Edited ]

I agree. What is the best solution for this?