Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive databases are not visible in Spark session.

SOLVED Go to solution
Highlighted

Hive databases are not visible in Spark session.

Hi,

I am trying to run spark application which will need access to Hive databases. But Hive databases like FOODMART are not visible in spark session.

I did spark.sql("show databases").show(); it is not showing Foodmart database, though spark session is having enableHiveSupport.


Below i've tried:

1)

cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

2)

Changed spark.sql.warehouse.dir in spark UI from /apps/spark/warehouse to /warehouse/tablespace/managed/hive

Even though it is not working.


Please let me know what configuration changes would be required to have this.


Please note - Above is working in HDP2.6.5.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Hive databases are not visible in Spark session.

Contributor

Hi @Shashank Naresh,


It's not clear what is your current version, I'll assume HDP3. If that is the case, you may want to read the following links, along with its internal links:

Spark to Hive access on HDP3 - https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouse...

Configuration - https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_configure_a_s...

API Operations - https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouse...

In short, Spark has its own catalog, meaning that you will not natively have access to Hive catalog as you did on HDP2.


BR,

David Bompart


6 REPLIES 6

Re: Hive databases are not visible in Spark session.

Contributor

Hi @Shashank Naresh,


It's not clear what is your current version, I'll assume HDP3. If that is the case, you may want to read the following links, along with its internal links:

Spark to Hive access on HDP3 - https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouse...

Configuration - https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_configure_a_s...

API Operations - https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouse...

In short, Spark has its own catalog, meaning that you will not natively have access to Hive catalog as you did on HDP2.


BR,

David Bompart


Re: Hive databases are not visible in Spark session.

I am using HDP 3.1.0

Re: Hive databases are not visible in Spark session.

Hi @dbompart,

Thanks for the answer,

I am using HDP3.1,

I've tried to change the settings per link "https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_configure_a_s..."

1) Spark setting below110334-screenshot-from-2019-08-13-09-22-34.png

2) Trying to get hive databases in spark - no success;

110335-screenshot-from-2019-08-13-09-23-29.png

3) Can see hive databases in hive

110336-screenshot-from-2019-08-13-09-25-52.png


Could you please assist me on this, what else needs to be done.

Re: Hive databases are not visible in Spark session.

Contributor

Hey Shashank,


You're still skipping the link to: API Operations - https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.1.0/integrating-hive/content/hive_hivewarehouse...


Listing the databases in Hive from Spark using the SparkSQL API will not work, as long as metastore.default.catalog is set to "spark" which is the default value and recommended to leave it as it is. So to summarize, by default SparkSQL API (spark.sql("$query")) will access the Spark catalog, instead you should be using the HiveWarehouseSessionAPI as explained in the link above, something like:


import com.hortonworks.hwc.HiveWarehouseSession

import com.hortonworks.hwc.HiveWarehouseSession._

val hive = HiveWarehouseSession.session(spark).build()
hive.showDatabases().show()
hive.setDatabase("foodmart")
hive.showTables().show()
hive.execute("describe formatted foodmartTable").show()
hive.executeQuery("select * from foodmartTable limit 5").show()

Re: Hive databases are not visible in Spark session.

Hi @dbompart,


Thanks for your reply;

I've tried below:

1) Changed the Zeppelin setting per below

110382-screenshot-from-2019-08-13-22-51-21.png

2) Restarted notebook

3) Tried below code in notebook and getting below import error.

110383-screenshot-from-2019-08-13-22-52-36.png


Requesting to assist here.

Thanks and Regards.

Re: Hive databases are not visible in Spark session.

Contributor

Zeppelin and Spark-shell are not the same client and properties work diferently, if you moved on to Zeppelin can we assume it did work for Spark-shell?


In regard to the Zeppelin issue, the problem should be within the the path to the hive warehouse connector file either on the spark.jars or the spark.submit.pyFiles, I believe the path must be whitelisted in Zeppelin, but its clear that the hivewarehouseconnector files are not being succesfully uploaded to the application classpath, therefore, the pyspark_llap module cannot be imported. Hope it helps.


BR,

David