Created on 02-23-2015 10:48 AM - edited 09-16-2022 02:22 AM
We upgraded to CDH5.3. spark-sql is not working. Could any bodyk please provide us am I missing any steps we need to follow to spark-sql to work. I'm getting the below error.
Steps we did:
1.copy the hive-site.xml to /etc/spark/conf
2.Try to start the thrifiserver but getting the error.
/opt/cloudera/parcels/CDH/lib/spark/sbin/start-thriftserver.sh
starting org.apache.spark.sql.hive.thriftserver.HiveThriftServer2, logging to /var/log/spark/spark-root-org.apache.spark.sql.hive.thriftserver.HiveThriftServer2-1-rkqcrl-odshaun01.out
failed to launch org.apache.spark.sql.hive.thriftserver.HiveThriftServer2:
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 18 more
bash-4.1$ ./spark-sql
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/cli/CliDriver
at java.lang.ClassLoader.defineClass1(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:412)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:342)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.cli.CliDriver
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 18 more
Regards,
Venky
Created 03-25-2015 09:35 AM
Created 02-23-2015 10:58 AM
So, Spark SQL is shipped unchanged from upstream. It should mostly work as-is, as a result. It is not formally supported, as it's still an alpha component. Here in particular, have a look at other threads on this forum. I think the issue is that Spark SQL is not yet compatible with the later version of Hive in CDH, so it's not built with Hive support. Some of it should still work, but you have to add the Hive JARs to the classpath at least.
Created 02-23-2015 11:26 AM
Thanks for quick reply. I have added the path too. But still not working. correct me if I'm wrong or missing something.
#!/bin/bash
export JAVA_HOME=/usr/java/jdk1.7.0_55
SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark/
SPARK_CLASSPATH=/opt/cloudera/parcels/CDH/lib/spark/lib/*.jar
JARS=""
for j in `ls /opt/cloudera/parcels/CDH/lib/hadoop/client/*.jar`
do
JARS=$JARS:$j
JARS1=$j,$JARS1
done
CLI=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-cli-0.13.1-cdh5.3.1.jar:/opt/cloudera/parcels/CDH/lib/hive/lib/hive-common-0.13.1-cdh5.3.1.jar:=/opt/cloudera/parcels/CDH/lib/hive/lib/hive-jdbc-0.13.1-cdh5.3.1.jar:/opt/cloudera/parcels/CDH/lib/hive/lib/hive-exec-0.13.1-cdh5.3.1.jar
$SPARK_CLASSPATH:$JARS:$CLI
$SPARK_HOME/bin/spark-sql --master local
Created 02-23-2015 11:28 AM
You're probably beyond my knowledge. But the immediate error is easy enough to understand; it can't find the Hive classes, so something is still wrong there. I see a typo in your path for example; there are two jars separated by ":=" Is it just that?
Created 02-23-2015 12:03 PM
my bad. I fixed the typo. But still no luck. Thanks.
Created 03-24-2015 10:52 AM
Created 03-24-2015 11:32 AM
Hi,
There is bug in the classpath.
You need add a line int the compute-classpath.sh CLASSPATH="$CLASSPATH:/opt/cloudera/parcels/CDH/lib/hive/lib/*" . Then it will work without any issues.
Regards,
Venkat
Created 03-25-2015 09:35 AM
Created 03-25-2015 09:48 AM
To add a little color, yes you can do that, although the CLASSPATH intentionally does not include Hive, since as I understand, Spark doesn't work with the later versions of Hive that CDH 5.3 and beyond use. It still may work enough to do what you need, so, have at it. But you may hit some incompatibilities.
Created on 03-25-2015 10:00 AM - edited 03-25-2015 12:37 PM
I agree. What is the best solution for this?