- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark sql and Hive tables
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark
-
Security
Created on ‎11-26-2014 09:34 PM - edited ‎09-16-2022 02:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I installed Spark 1.1.0 and Hive 0.13, I try to run example code
# sc is an existing SparkContext. from pyspark.sql import HiveContext sqlContext = HiveContext(sc) sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)") sqlContext.sql("LOAD DATA LOCAL INPATH 'examples/src/main/resources/kv1.txt' INTO TABLE src") # Queries can be expressed in HiveQL. results = sqlContext.sql("FROM src SELECT key, value").collect()
so I get error:
Exception in thread "Thread-2" java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:190)
at py4j.reflection.TypeUtil.getClass(TypeUtil.java:265)
at py4j.reflection.TypeUtil.forName(TypeUtil.java:245)
at py4j.commands.ReflectionCommand.getUnknownMember(ReflectionCommand.java:153)
at py4j.commands.ReflectionCommand.execute(ReflectionCommand.java:82)
at py4j.GatewayConnection.run(GatewayConnection.java:207)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 8 more
Everyone can help me?
Created ‎11-28-2014 07:02 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You need to have the hive client jars in your classpath.
Created ‎03-11-2015 07:21 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You can also just add the HIve jars to your app classpath.
The catch here is that Spark doesn't quite support the later version of Hive in CDH. This might work for what you're trying to do, but if you build your own, you're building for a slightly different version of Hive than you run here.
Created ‎03-11-2015 07:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks
Created ‎04-24-2015 01:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm also seeing this error.
Strangely I am including the jar in the spark submit command:
/usr/bin/spark-submit --class com.mycompany.myproduct.spark.sparkhive.Hive2RddTest --master spark://mycluster:7077 --executor-memory 8G --jars hive-common-0.13.1-cdh5.3.1.jar sparkhive.jar "/home/stunos/hive.json" &
Is this insufficient to add this to the classpath? It has worked for other dependencies so presumably spark copies the dependencies to the other nodes. I am puzzled by this exception.
I can attempt to add this jar to /opt/cloudera/parcels/CDH/spark/lib on each node but at this point it is only a voodoo guess since by my logic the command line argument should have been sufficient.
What do you think? Does this mean I probably have to build Spark?

- « Previous
-
- 1
- 2
- Next »