Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Spark sql and Hive tables

avatar

Hi, I installed Spark 1.1.0 and Hive 0.13, I try to run example code

# sc is an existing SparkContext.
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)")
sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src")

# Queries can be expressed in HiveQL.
results = sqlContext.sql("FROM src SELECT key, value").collect()

          so I get error:
Exception in thread "Thread-2" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at py4j.reflection.TypeUtil.getClass(TypeUtil.java:265)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:245)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:153)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:82)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 8 more

Everyone can help me?

1 ACCEPTED SOLUTION

avatar
Explorer

You need to have the hive client jars in your classpath.

View solution in original post

12 REPLIES 12

avatar
Master Collaborator

You can also just add the HIve jars to your app classpath.

The catch here is that Spark doesn't quite support the later version of Hive in CDH. This might work for what you're trying to do, but if you build your own, you're building for a slightly different version of Hive than you run here.

avatar
Explorer

Thanks 

 

avatar
Explorer

I'm also seeing this error.

 

Strangely I am including the jar in the spark submit command:

 

/usr/bin/spark-submit --class com.mycompany.myproduct.spark.sparkhive.Hive2RddTest --master spark://mycluster:7077 --executor-memory 8G --jars hive-common-0.13.1-cdh5.3.1.jar sparkhive.jar "/home/stunos/hive.json" &

 

Is this insufficient to add this to the classpath? It has worked for other dependencies so presumably spark copies the dependencies to the other nodes. I am puzzled by this exception.

 

I can attempt to add this jar to /opt/cloudera/parcels/CDH/spark/lib on each node but at this point it is only a voodoo guess since by my logic the command line argument should have been sufficient. 

 

What do you think? Does this mean I probably have to build Spark?