Support Questions

Find answers, ask questions, and share your expertise

spark sql is not working on CDH5.3

avatar
Contributor

We upgraded to CDH5.3. spark-sql is not working. Could any bodyk please provide us am I missing any steps we need to follow to spark-sql to work. I'm getting the below error.

 

Steps we did:

1.copy the hive-site.xml to /etc/spark/conf

2.Try to start the thrifiserver but getting the error.

 

/opt/cloudera/parcels/CDH/lib/spark/sbin/start-thriftserver.sh
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /var/log/spark/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-rkqcrl-odshaun01.out
failed to launch org.apache.spark.sql.hive.thriftserver.HiveThriftServer2:
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 18 more

 

bash-4.1$ ./spark-sql
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/cli/CliDriver
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:342)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.cli.CliDriver
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 18 more

 

Regards,

Venky

1 ACCEPTED SOLUTION

avatar
New Contributor
thanks a lot! this does the trick:)

View solution in original post

11 REPLIES 11

avatar
Master Collaborator

So, Spark SQL is shipped unchanged from upstream. It should mostly work as-is, as a result. It is not formally supported, as it's still an alpha component. Here in particular, have a look at other threads on this forum. I think the issue is that Spark SQL is not yet compatible with the later version of Hive in CDH, so it's not built with Hive support. Some of it should still work, but you have to add the Hive JARs to the classpath at least.

avatar
Contributor

Thanks for quick reply. I have added the path too.  But still not working. correct me if I'm wrong or missing something.

 

#!/bin/bash
export JAVA_HOME=/usr/java/jdk1.7.0_55
SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/

SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/spark/lib/*.jar

 

JARS=""
for j in `ls /opt/cloudera/parcels/CDH/lib/hadoop/client/*.jar`
do
JARS=$JARS:$j
JARS1=$j,$JARS1
done

CLI=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-cli-0.13.1-cdh5.3.1.jar:/opt/cloudera/parcels/CDH/lib/hive/lib/hive-common-0.13.1-cdh5.3.1.jar:=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-0.13.1-cdh5.3.1.jar:/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec-0.13.1-cdh5.3.1.jar
$SPARK_CLASSPATH:$JARS:$CLI

$SPARK_HOME/bin/spark-sql --master local

avatar
Master Collaborator

You're probably beyond my knowledge. But the immediate error is easy enough to understand; it can't find the Hive classes, so something is still wrong there. I see a typo in your path for example; there are two jars separated by ":=" Is it just that?

avatar
Contributor

my bad. I fixed the typo. But still no luck. Thanks.

avatar
New Contributor
i have the same issue. Even have hive classpath included, it still gives error.

Anyone has luck on this?

avatar
Contributor

Hi,

 

There is bug in the classpath.

 

You need add a line int the compute-classpath.sh CLASSPATH="$CLASSPATH:/opt/cloudera/parcels/CDH/lib/hive/lib/*" .  Then it will work without any issues.

 

Regards,

Venkat

 

 

avatar
New Contributor
thanks a lot! this does the trick:)

avatar
Master Collaborator

To add a little color, yes you can do that, although the CLASSPATH intentionally does not include Hive, since as I understand, Spark doesn't work with the later versions of Hive that CDH 5.3 and beyond use. It still may work enough to do what you need, so, have at it. But you may hit some incompatibilities.

avatar
Contributor

I agree. What is the best solution for this?