Created on 11-26-2014 09:34 PM - edited 09-16-2022 02:14 AM
Hi, I installed Spark 1.1.0 and Hive 0.13, I try to run example code
# sc is an existing SparkContext. from pyspark.sql import HiveContext sqlContext = HiveContext(sc) sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src") # Queries can be expressed in HiveQL. results = sqlContext.sql("FROM src SELECT key, value").collect()
so I get error:
Exception in thread "Thread-2" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at py4j.reflection.TypeUtil.getClass(TypeUtil.java:265)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:245)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:153)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:82)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 8 more
Everyone can help me?
Created 11-28-2014 07:02 AM
You need to have the hive client jars in your classpath.
Created 03-11-2015 07:21 AM
You can also just add the HIve jars to your app classpath.
The catch here is that Spark doesn't quite support the later version of Hive in CDH. This might work for what you're trying to do, but if you build your own, you're building for a slightly different version of Hive than you run here.
Created 03-11-2015 07:28 AM
Thanks
Created 04-24-2015 01:34 PM
I'm also seeing this error.
Strangely I am including the jar in the spark submit command:
/usr/bin/spark-submit --class com.mycompany.myproduct.spark.sparkhive.Hive2RddTest --master spark://mycluster:7077 --executor-memory 8G --jars hive-common-0.13.1-cdh5.3.1.jar sparkhive.jar "/home/stunos/hive.json" &
Is this insufficient to add this to the classpath? It has worked for other dependencies so presumably spark copies the dependencies to the other nodes. I am puzzled by this exception.
I can attempt to add this jar to /opt/cloudera/parcels/CDH/spark/lib on each node but at this point it is only a voodoo guess since by my logic the command line argument should have been sufficient.
What do you think? Does this mean I probably have to build Spark?