Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark SQL in CDH 5.2 not working

Spark SQL in CDH 5.2 not working

Expert Contributor

I have been trying to use spark sql in CDH 5.2 using scala in spark -shell.

 

I wanted to test out spark sql.

I was trying a simple select statement in scala:-

 

import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.sql.hive._

 

val sparkConf = new SparkConf().setAppName("HiveFromSpark")
val sc = new SparkContext(sparkConf)
val hiveContext = new HiveContext(sc)

import hiveContext.sql

println("Result of 'SELECT *': ")
sql("SELECT * FROM sample_07 limit 10").collect().foreach(println)


sc.stop()

 

The hive context gave me an error like:-

 

scala> val hiveContext = new HiveContext(sc)
error: bad symbolic reference. A signature in HiveContext.class refers to term hive
in package org.apache.hadoop which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling HiveContext.class.
error:
while compiling: <console>
during phase: erasure
library version: version 2.10.4
compiler version: version 2.10.4
reconstructed args:

last tree to typer: This(class $iwC)
symbol: class $iwC (flags: )
symbol definition: class $iwC extends Serializable
tpe: $iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.type
symbol owners: class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $read -> package $line31
context owners: class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $iwC -> class $read -> package $line31

== Enclosing template or block ==

ClassDef( // class $iwC extends Serializable
0
"$iwC"
[]
Template( // val <local $iwC>: <notype>, tree.tpe=$iwC
"java.lang.Object", "scala.Serializable" // parents
ValDef(
private
"_"
<tpt>
<empty>
)
// 5 statements
DefDef( // def <init>(arg$outer: $iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.type): $iwC
<method> <triedcooking>
"<init>"
[]
// 1 parameter list
ValDef( // $outer: $iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.type
<param>
"$outer"
<tpt> // tree.tpe=$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.type
<empty>
)
<tpt> // tree.tpe=$iwC
Block( // tree.tpe=Unit
Apply( // def <init>(): Object in class Object, tree.tpe=Object
$iwC.super."<init>" // def <init>(): Object in class Object, tree.tpe=()Object
Nil
)
()
)
)
ValDef( // private[this] val hiveContext: org.apache.spark.sql.hive.HiveContext
private <local> <triedcooking>
"hiveContext "
<tpt> // tree.tpe=org.apache.spark.sql.hive.HiveContext
Apply( // def <init>(sc: org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class HiveContext, tree.tpe=org.apache.spark.sql.hive.HiveContext
new org.apache.spark.sql.hive.HiveContext."<init>" // def <init>(sc: org.apache.spark.SparkContext): org.apache.spark.sql.hive.HiveContext in class HiveContext, tree.tpe=(sc: org.apache.spark.SparkContext)org.apache.spark.sql.hive.HiveContext
Apply( // val sc(): org.apache.spark.SparkContext, tree.tpe=org.apache.spark.SparkContext
$iwC.this.$line31$$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$outer().$VAL3().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw().$iw()."sc" // val sc(): org.apache.spark.SparkContext, tree.tpe=()org.apache.spark.SparkContext
Nil
)
)
)
DefDef( // val hiveContext(): org.apache.spark.sql.hive.HiveContext
<method> <stable> <accessor>
"hiveContext"
[]
List(Nil)
<tpt> // tree.tpe=org.apache.spark.sql.hive.HiveContext
$iwC.this."hiveContext " // private[this] val hiveContext: org.apache.spark.sql.hive.HiveContext, tree.tpe=org.apache.spark.sql.hive.HiveContext
)
ValDef( // protected val $outer: $iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.type
protected <synthetic> <paramaccessor> <triedcooking>
"$outer "
<tpt> // tree.tpe=$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.type
<empty>
)
DefDef( // val $outer(): $iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.type
<method> <synthetic> <stable> <expandedname> <triedcooking>
"$line31$$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$$outer"
[]
List(Nil)
<tpt> // tree.tpe=Any
$iwC.this."$outer " // protected val $outer: $iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.type, tree.tpe=$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.$iwC.type
)
)
)

== Expanded type of tree ==

ThisType(class $iwC)

uncaught exception during compilation: scala.reflect.internal.Types$TypeError
scala.reflect.internal.Types$TypeError: bad symbolic reference. A signature in HiveContext.class refers to term conf
in value org.apache.hadoop.hive which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling HiveContext.class.
That entry seems to have slain the compiler. Shall I replay
your session? I can re-run each line except the last one.
[y/n]Replaying: import org.apache.spark._
error:

5 REPLIES 5

Re: Spark SQL in CDH 5.2 not working

Master Collaborator

I believe this requires you to add the Hive jars to the classpath of your Spark job.

Re: Spark SQL in CDH 5.2 not working

Expert Contributor

I tried that but it gave me an error like this:-

 

Failed to initialize compiler: object scala.runtime in compiler mirror not found
Note that as of 2.8 scala does not assume use of the java classpath.For the old behavior pass -usejavacp to scala, or if using a Settings object programatically, settings.usejavacp.value = true.

Re: Spark SQL in CDH 5.2 not working

Master Collaborator

Hm, that's beyond my knowledge. Spark SQL is not supported, but I have certainly seen it work. This seems like an error from Scala itself.

Re: Spark SQL in CDH 5.2 not working

Expert Contributor

i added these jar files:- 

 

Does CDH 5.3 has better support of spark than CDH 5.2?

 

SparkJarFiles.jpg

Re: Spark SQL in CDH 5.2 not working

Master Collaborator

Well, it also works. I use it. It's not any different from the upstream Spark SQL. That said it is not supported officially by any version of CDH.

Don't have an account?
Coming from Hortonworks? Activate your account here