Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to access the hive tables from spark-shell

how to access the hive tables from spark-shell

Explorer

Hi,


I am trying to access the already existing table in hive by using spark shell

But when I run the instructions, error comes "table not found".

e.g. in hive table is existing name as "department" in default database.


i start the spark-shell and execute the following set of instructions.


import org.apache.spark.sql.hive.HiveContext
val sqlContext = new HiveContext(sc)
val depts = sqlContext.sql("select * from departments")
depts.collecat().foreach(println)



but it coudn't find the table.



Now My questions are:

1. As I know ny using HiveContext spark can access the hive metastore. But it is not doing here, so is there any configuration setup required?  I am using Cloudera quickstart VM 5..5

2. As an alternative I created the table on spark-shell , load a data file and then performed some queries and then exit the spark shell.

3. even if I create the table using spark-shell, it is not anywhere existing when I am trying to access it using hive editor.

4. when i again start the spark-shell , then earlier table i created, was no longer existing, so exactly where this table and metadata is stored and all....


I am very much confused, because accroding to theortical concepts, it should go under the hive metastore.

Thanks & Regards

16 REPLIES 16

Re: how to access the hive tables from spark-shell

Explorer

to connect to hive metastore you need to copy the hive-site.xml file into spark/conf directory. After that spark will be able to connect to hive metastore.
so run the  following ommand after log in as root user   

 

cp  /usr/lib/hive/conf/hive-site.xml    /usr/lib/spark/conf/

Re: how to access the hive tables from spark-shell

New Contributor

Or you create a symbolic link to avoid file version syncing issues:

ln -s /usr/lib/hive/conf/hive-site.xml    /usr/lib/spark/conf/hive-site.xml

Re: how to access the hive tables from spark-shell

Contributor

Still the issue is persisting,

What else can we do to make it work other than hive-site.xml

Re: how to access the hive tables from spark-shell

Contributor

which version spark are you using?

assuming you are using 1.4v or higher.

 

import org.apache.spark.sql.hive.HiveContext
import sqlContext.implicits._
val hiveObj = new HiveContext(sc)

hiveObj.refreshTable("db.table") // if you have uograded your hive do this, to refresh the tables.

val sample = sqlContext.sql("select * from table").collect()
sample.foreach(println)

 

This has worked for me

Re: how to access the hive tables from spark-shell

I have downloaded Cloudera quickstart 5.10 for VirtualBox.

But it's not loading hive data into spark 

 

import org.apache.spark.sql.hive.HiveContext
import sqlContext.implicits._
val hiveObj = new HiveContext(sc)

hiveObj.refreshTable("db.table") // if you have uograded your hive do this, to refresh the tables.

val sample = sqlContext.sql("select * from table").collect()
sample.foreach(println)

 

Still i'm getting the error as table not found(It's not accessing metadata)

What should i do, Any one pls help me

Re: how to access the hive tables from spark-shell

New Contributor

I'm having the same issue. I'm using CDH 5.10 with Spark on Yarn

 

Also, is there a way to incllude hive-site.xml through Cloudera Manager? At the moment I have a script to make sure that the symlink is there (and links to the correct hive-site.xml) in the whole cluster, but getting Cloudera Manager to do it for me would be easier, faster and less error prone.

Re: how to access the hive tables from spark-shell

New Contributor

Hi!

 

On the last week i have resolved the same problem for Spark 2.

 

 

For this I've select the Hive Service dependance on the Spark 2 service Configuration page (Service-Wide Category):

Spark2.png

 

After stale services was restarted Spark 2 started to works correctly.

Re: how to access the hive tables from spark-shell

Explorer

I am having the same issue and copying the hive-site.xml did not resolve the issue for me.  I am not using spark2, but the v1.6 that comes with Cloudera 5.13 - and there is no spark/hive configuration setting.  Was anyone else able to figure out how to fix this?  Thanks!

Highlighted

Re: how to access the hive tables from spark-shell

New Contributor

Hi!

 

Have you installed the appropriate Gateways on the server where these configuration settings are required?

Don't have an account?
Coming from Hortonworks? Activate your account here