Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

spark sql is not working on CDH5.3

Solved Go to solution

spark sql is not working on CDH5.3

We upgraded to CDH5.3. spark-sql is not working. Could any bodyk please provide us am I missing any steps we need to follow to spark-sql to work. I'm getting the below error.

 

Steps we did:

1.copy the hive-site.xml to /etc/spark/conf

2.Try to start the thrifiserver but getting the error.

 

/opt/cloudera/parcels/CDH/lib/spark/sbin/start-thriftserver.sh
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /var/log/spark/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-rkqcrl-odshaun01.out
failed to launch org.apache.spark.sql.hive.thriftserver.HiveThriftServer2:
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 18 more

 

bash-4.1$ ./spark-sql
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/cli/CliDriver
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:270)
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:342)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.cli.CliDriver
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 18 more

 

Regards,

Venky

1 ACCEPTED SOLUTION

Accepted Solutions

Re: spark sql is not working on CDH5.3

New Contributor
thanks a lot! this does the trick:)
11 REPLIES 11

Re: spark sql is not working on CDH5.3

Master Collaborator

So, Spark SQL is shipped unchanged from upstream. It should mostly work as-is, as a result. It is not formally supported, as it's still an alpha component. Here in particular, have a look at other threads on this forum. I think the issue is that Spark SQL is not yet compatible with the later version of Hive in CDH, so it's not built with Hive support. Some of it should still work, but you have to add the Hive JARs to the classpath at least.

Re: spark sql is not working on CDH5.3

Thanks for quick reply. I have added the path too.  But still not working. correct me if I'm wrong or missing something.

 

#!/bin/bash
export JAVA_HOME=/usr/java/jdk1.7.0_55
SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/

SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/spark/lib/*.jar

 

JARS=""
for j in `ls /opt/cloudera/parcels/CDH/lib/hadoop/client/*.jar`
do
JARS=$JARS:$j
JARS1=$j,$JARS1
done

CLI=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-cli-0.13.1-cdh5.3.1.jar:/opt/cloudera/parcels/CDH/lib/hive/lib/hive-common-0.13.1-cdh5.3.1.jar:=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-0.13.1-cdh5.3.1.jar:/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec-0.13.1-cdh5.3.1.jar
$SPARK_CLASSPATH:$JARS:$CLI

$SPARK_HOME/bin/spark-sql --master local

Re: spark sql is not working on CDH5.3

Master Collaborator

You're probably beyond my knowledge. But the immediate error is easy enough to understand; it can't find the Hive classes, so something is still wrong there. I see a typo in your path for example; there are two jars separated by ":=" Is it just that?

Re: spark sql is not working on CDH5.3

my bad. I fixed the typo. But still no luck. Thanks.

Re: spark sql is not working on CDH5.3

New Contributor
i have the same issue. Even have hive classpath included, it still gives error.

Anyone has luck on this?

Re: spark sql is not working on CDH5.3

Hi,

 

There is bug in the classpath.

 

You need add a line int the compute-classpath.sh CLASSPATH="$CLASSPATH:/opt/cloudera/parcels/CDH/lib/hive/lib/*" .  Then it will work without any issues.

 

Regards,

Venkat

 

 

Re: spark sql is not working on CDH5.3

New Contributor
thanks a lot! this does the trick:)

Re: spark sql is not working on CDH5.3

Master Collaborator

To add a little color, yes you can do that, although the CLASSPATH intentionally does not include Hive, since as I understand, Spark doesn't work with the later versions of Hive that CDH 5.3 and beyond use. It still may work enough to do what you need, so, have at it. But you may hit some incompatibilities.

Highlighted

Re: spark sql is not working on CDH5.3

I agree. What is the best solution for this?