Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

how to access the hive tables from spark-shell

how to access the hive tables from spark-shell

Explorer

Hi,


I am trying to access the already existing table in hive by using spark shell

But when I run the instructions, error comes "table not found".

e.g. in hive table is existing name as "department" in default database.


i start the spark-shell and execute the following set of instructions.


import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc)
val depts = sqlContext.sql("select * from departments")
depts.collecat().foreach(println)



but it coudn't find the table.



Now My questions are:

1. As I know ny using HiveContext spark can access the hive metastore. But it is not doing here, so is there any configuration setup required?  I am using Cloudera quickstart VM 5..5

2. As an alternative I created the table on spark-shell , load a data file and then performed some queries and then exit the spark shell.

3. even if I create the table using spark-shell, it is not anywhere existing when I am trying to access it using hive editor.

4. when i again start the spark-shell , then earlier table i created, was no longer existing, so exactly where this table and metadata is stored and all....


I am very much confused, because accroding to theortical concepts, it should go under the hive metastore.

Thanks & Regards

16 REPLIES 16

Re: how to access the hive tables from spark-shell

Explorer

to connect to hive metastore you need to copy the hive-site.xml file into spark/conf directory. After that spark will be able to connect to hive metastore.
so run the  following ommand after log in as root user   

 

cp  /usr/lib/hive/conf/hive-site.xml    /usr/lib/spark/conf/

Re: how to access the hive tables from spark-shell

New Contributor

Or you create a symbolic link to avoid file version syncing issues:

ln -s /usr/lib/hive/conf/hive-site.xml    /usr/lib/spark/conf/hive-site.xml

Re: how to access the hive tables from spark-shell

Contributor

Still the issue is persisting,

What else can we do to make it work other than hive-site.xml

Re: how to access the hive tables from spark-shell

Contributor

which version spark are you using?

assuming you are using 1.4v or higher.

 

import org.apache.spark.sql.hive.HiveContext
import sqlContext.implicits._
val hiveObj = new HiveContext(sc)

hiveObj.refreshTable("db.table") // if you have uograded your hive do this, to refresh the tables.

val sample = sqlContext.sql("select * from table").collect()
sample.foreach(println)

 

This has worked for me

Re: how to access the hive tables from spark-shell

I have downloaded Cloudera quickstart 5.10 for VirtualBox.

But it's not loading hive data into spark 

 

import org.apache.spark.sql.hive.HiveContext
import sqlContext.implicits._
val hiveObj = new HiveContext(sc)

hiveObj.refreshTable("db.table") // if you have uograded your hive do this, to refresh the tables.

val sample = sqlContext.sql("select * from table").collect()
sample.foreach(println)

 

Still i'm getting the error as table not found(It's not accessing metadata)

What should i do, Any one pls help me

Re: how to access the hive tables from spark-shell

New Contributor

I'm having the same issue. I'm using CDH 5.10 with Spark on Yarn

 

Also, is there a way to incllude hive-site.xml through Cloudera Manager? At the moment I have a script to make sure that the symlink is there (and links to the correct hive-site.xml) in the whole cluster, but getting Cloudera Manager to do it for me would be easier, faster and less error prone.

Highlighted

Re: how to access the hive tables from spark-shell

New Contributor

Hi!

 

On the last week i have resolved the same problem for Spark 2.

 

 

For this I've select the Hive Service dependance on the Spark 2 service Configuration page (Service-Wide Category):

Spark2.png

 

After stale services was restarted Spark 2 started to works correctly.

Re: how to access the hive tables from spark-shell

Explorer

I am having the same issue and copying the hive-site.xml did not resolve the issue for me.  I am not using spark2, but the v1.6 that comes with Cloudera 5.13 - and there is no spark/hive configuration setting.  Was anyone else able to figure out how to fix this?  Thanks!

Re: how to access the hive tables from spark-shell

New Contributor

Hi!

 

Have you installed the appropriate Gateways on the server where these configuration settings are required?