Support Questions

Find answers, ask questions, and share your expertise

How to load Hive data in to Spark-shell

I have downloaded Cloudera quickstart 5.10 for VirtualBox.

But it's not loading hive data into spark 

 

import org.apache.spark.sql.hive.HiveContext
import sqlContext.implicits._
val hiveObj = new HiveContext(sc)

hiveObj.refreshTable("db.table") // if you have uograded your hive do this, to refresh the tables.

val sample = sqlContext.sql("select * from table").collect()
sample.foreach(println)

 

Still i'm getting the error as table not found(It's not accessing metadata)

What should i do, Any one pls help me

(In cloudera quickstart we are unable to copy hive-site.xml in to spark/conf)

 

14 REPLIES 14

Champion

@hadoopSparkZen

 

You have to declare the variable sqlContext before you import as follows.. But you are using hiveObj instead... Once you are done with the below steps, you can use sqlContext to interact with Hive

 

val sqlContext = new HiveContext(sc)
import sqlContext.implicits._

Thank you for your replay, but still its not loading.

giving errror as table not found.

Could you please show how to load hive data in sparks-shell(I'm using Cloudera quick start5.10 in VirtualBox)

 

Champion
replace this in your code 
val sample = hiveObj.sql("select * from table").collect()

you need to use the hiveObj instead of SqlContext. 

 

let me know if that helps

Champion

@hadoopSparkZen

 

Try this, it will work (Note: Login to hive and make sure the table is exists)

 

import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
import org.apache.spark.sql.hive.HiveContext

val conf = new SparkConf().setAppName("Test").setMaster("yarn-client")
val sc = new SparkContext(conf)
val sqlContext = new HiveContext(sc)
import sqlContext.implicits._

val resultDF = sqlContext.sql("select * from test.emp where empid=100")

Thank you for quick replay to Saranvisa and csguna. Please see bellow

Table is existed and its loaded as 

scala> val l = sc.textFile("/user/hive/warehouse/cloudera.db/test1").collect().foreach(println)
1|Raj|200
2|Rahul|300
3|Ram|400
4|Sham|250
5|John|500
l: Unit = ()

scala>

 

But in the bellow its not working, Could you please check it once(its showing one metastore warning also in 2nd step) I try in different ways but its showing as "table not found"

 

scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext

scala> val sqlContext = new HiveContext(sc)
17/07/03 22:47:30 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@36bcf0b6

scala> import sqlContext.implicits._
import sqlContext.implicits._

scala> val r = sqlContext.sql("select * from cloudera.test1")
org.apache.spark.sql.AnalysisException: Table not found: `cloudera`.`test1`; line 1 pos 23
    at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:54)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
    at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
    at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$foreachUp$1.apply(TreeNode.scala:120)
    at scala.collection.immutable.List.foreach(List.scala:318)
    at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:120)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$class.checkAnalysis(CheckAnalysis.scala:50)
    at org.apache.spark.sql.catalyst.analysis.Analyzer.checkAnalysis(Analyzer.scala:44)
    at org.apache.spark.sql.execution.QueryExecution.assertAnalyzed(QueryExecution.scala:34)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:133)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:52)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:817)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:33)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:44)
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:46)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:48)
    at $iwC$$iwC$$iwC.<init>(<console>:50)
    at $iwC$$iwC.<init>(<console>:52)
    at $iwC.<init>(<console>:54)
    at <init>(<console>:56)
    at .<init>(<console>:60)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1045)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1326)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:821)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:852)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:800)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1064)
    at org.apache.spark.repl.Main$.main(Main.scala:35)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Champion

As mentioned in my post replace the below line of code 

 

scala> val r = sqlContext.sql("select * from cloudera.test1")

with 

 

val sample = hiveObj.sql("select * from table").collect()

U need to use hiveObj. This will fix the error 

Hi Guna,

i did as you say still same thing repeating,Please see bellow:

 

scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext

scala> import sqlContext.implicits._
import sqlContext.implicits._

scala> val hiveObj = new HiveContext(sc)
17/07/04 02:10:55 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException
hiveObj: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@3474ddfe

scala> hiveObj.refreshTable("cloudera.test1")

scala> val s = hiveObj.sql("select * from cloudera.test1").collect()
org.apache.spark.sql.AnalysisException: Table not found: `cloudera`.`test1`; line 1 pos 23

at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:54)
    at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1.apply(CheckAnalysis.scala:50)
    at org.apache.spark.sql.catalyst.trees.TreeNode.foreachUp(TreeNode.scala:121)

 

Champion

After runining the below statement 

 

scala> hiveObj.refreshTable("cloudera.test1")

try to see if you can list tables in the database using show after you refresh 

val df = hiveObj.sql("show tables in database_name);
df.show()

Hi Guna,

i try as you said, default and cloudera both databases have the tables but its not showing tables.

When i try as "show databases" its not showing bothdatabases its showing only default. Please See Bellow

 

scala> val df = hiveObj.sql("show tables in cloudera")
df: org.apache.spark.sql.DataFrame = [tableName: string, isTemporary: boolean]

scala> df.show()
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
+---------+-----------+


scala> val df1 = hiveObj.sql("show tables in default")
df1: org.apache.spark.sql.DataFrame = [tableName: string, isTemporary: boolean]

scala> df1.show()
+---------+-----------+
|tableName|isTemporary|
+---------+-----------+
+---------+-----------+


scala> val df2 = hiveObj.sql("show databases")
df2: org.apache.spark.sql.DataFrame = [result: string]

scala> df2.show()
+-------+
| result|
+-------+
|default|
+-------+

 

 

Champion

 now thats the reason it says table not found . mate 

Will dig more and comeback to u  . we almost narrow down to it 

Could you see if you have hive-site.xml & hdfs-site.xml in your spark conf folder 

 

/etc/spark/conf/

if not just do cp command and push those xml file  to /etc/spark/conf/ restart spark 

fire it again let us see . 

 

 

 

Thank you Guna.

i link the confuguration file hive to spark as

ln -s /etc/hive/conf/hive-site.xml /etc/spark/conf/hive-site.xml

it started workig after restart only.

 

Champion

Hurray ! 🙂 @hadoopSparkZen

New Contributor

@csgunaand @hadoopSparkZen  you guys have saved my day. Thanks to both of you 🙂

New Contributor

Hi, I am also facing the same issue. Not able to load hive table into spark.

 

I tried to copy the xml files in spark conf folder. But its permission denied and I tries to change the permission for the folder also .That is also not working.

 

Using cloudera vm 5.12

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.