Support Questions

Find answers, ask questions, and share your expertise

how to access the hive tables from spark-shell

avatar
Contributor

Hi,


I am trying to access the already existing table in hive by using spark shell

But when I run the instructions, error comes "table not found".

e.g. in hive table is existing name as "department" in default database.


i start the spark-shell and execute the following set of instructions.


import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc)
val depts = sqlContext.sql("select * from departments")
depts.collecat().foreach(println)



but it coudn't find the table.



Now My questions are:

1. As I know ny using HiveContext spark can access the hive metastore. But it is not doing here, so is there any configuration setup required?  I am using Cloudera quickstart VM 5..5

2. As an alternative I created the table on spark-shell , load a data file and then performed some queries and then exit the spark shell.

3. even if I create the table using spark-shell, it is not anywhere existing when I am trying to access it using hive editor.

4. when i again start the spark-shell , then earlier table i created, was no longer existing, so exactly where this table and metadata is stored and all....


I am very much confused, because accroding to theortical concepts, it should go under the hive metastore.

Thanks & Regards

1 ACCEPTED SOLUTION

avatar
Visitor

Hi there, 

 

Just in case someone still needs the solution, here is what i tried and it works.

 

spark-shell --driver-java-options "-Dhive.metastore.uris=thrift://quickstart:9083"

 

I am using spark 1.6 with cloudera vm. 

 

val df=sqlContext.sql("show databases")

df.show

 

You should be able to see all the databases in hive. I hope it helps.

View solution in original post

17 REPLIES 17

avatar
Explorer
Yes, I have a spark gateway on the host and I copied hive-site.xml into /etc/spark/conf.


avatar
Visitor

On the Spark configuration page i dont have Hive checkbox too.

Try to install another version of Spark.

avatar
New Member
I tried this. but its permission denied.
Can you please help

avatar

Hi,

 

Did u fix this issue?

avatar
New Member

Try "select * from db.table" in line 3

avatar
New Member

Hi,


I am trying to access the already existing table in hive by using pyspark

e.g. in hive table is existing name as "department" in default database.

err msg :- 

 

18/10/15 22:01:23 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
18/10/15 22:02:35 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0-cdh5.13.0
18/10/15 22:02:38 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException

 

I checked the below files, they are same.

 

/usr/lib/hive/conf/hive-site.xml

 

/usr/lib/spark/conf/hive-site.xml

 

Any help on how to set up the HiveContext from pyspark is highly appreciated.

avatar
Visitor

Hi there, 

 

Just in case someone still needs the solution, here is what i tried and it works.

 

spark-shell --driver-java-options "-Dhive.metastore.uris=thrift://quickstart:9083"

 

I am using spark 1.6 with cloudera vm. 

 

val df=sqlContext.sql("show databases")

df.show

 

You should be able to see all the databases in hive. I hope it helps.

avatar
New Contributor

You are life saver, I have been struggling with this for 7-8 hours and my deadline to submit a case study was close. Thanks alot!!!