Support Questions

Find answers, ask questions, and share your expertise

how to access the hive tables from spark-shell

avatar
Contributor

Hi,


I am trying to access the already existing table in hive by using spark shell

But when I run the instructions, error comes "table not found".

e.g. in hive table is existing name as "department" in default database.


i start the spark-shell and execute the following set of instructions.


import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc)
val depts = sqlContext.sql("select * from departments")
depts.collecat().foreach(println)



but it coudn't find the table.



Now My questions are:

1. As I know ny using HiveContext spark can access the hive metastore. But it is not doing here, so is there any configuration setup required?  I am using Cloudera quickstart VM 5..5

2. As an alternative I created the table on spark-shell , load a data file and then performed some queries and then exit the spark shell.

3. even if I create the table using spark-shell, it is not anywhere existing when I am trying to access it using hive editor.

4. when i again start the spark-shell , then earlier table i created, was no longer existing, so exactly where this table and metadata is stored and all....


I am very much confused, because accroding to theortical concepts, it should go under the hive metastore.

Thanks & Regards

1 ACCEPTED SOLUTION

avatar
New Contributor

Hi there, 

 

Just in case someone still needs the solution, here is what i tried and it works.

 

spark-shell --driver-java-options "-Dhive.metastore.uris=thrift://quickstart:9083"

 

I am using spark 1.6 with cloudera vm. 

 

val df=sqlContext.sql("show databases")

df.show

 

You should be able to see all the databases in hive. I hope it helps.

View solution in original post

17 REPLIES 17

avatar
Explorer
Yes, I have a spark gateway on the host and I copied hive-site.xml into /etc/spark/conf.


avatar
Explorer

On the Spark configuration page i dont have Hive checkbox too.

Try to install another version of Spark.

avatar
New Contributor
I tried this. but its permission denied.
Can you please help

avatar
New Contributor

Hi,

 

Did u fix this issue?

avatar
New Contributor

Try "select * from db.table" in line 3

avatar
New Contributor

Hi,


I am trying to access the already existing table in hive by using pyspark

e.g. in hive table is existing name as "department" in default database.

err msg :- 

 

18/10/15 22:01:23 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
18/10/15 22:02:35 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0-cdh5.13.0
18/10/15 22:02:38 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException

 

I checked the below files, they are same.

 

/usr/lib/hive/conf/hive-site.xml

 

/usr/lib/spark/conf/hive-site.xml

 

Any help on how to set up the HiveContext from pyspark is highly appreciated.

avatar
New Contributor

Hi there, 

 

Just in case someone still needs the solution, here is what i tried and it works.

 

spark-shell --driver-java-options "-Dhive.metastore.uris=thrift://quickstart:9083"

 

I am using spark 1.6 with cloudera vm. 

 

val df=sqlContext.sql("show databases")

df.show

 

You should be able to see all the databases in hive. I hope it helps.

avatar
New Contributor

You are life saver, I have been struggling with this for 7-8 hours and my deadline to submit a case study was close. Thanks alot!!!