Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark2 shell is not displaying all Hive databases. only listing default database

Solved Go to solution

Spark2 shell is not displaying all Hive databases. only listing default database

Super Collaborator

I am using HDP 3.0 with Hive LLAP.

I have pasted the code and output below:

scala> import org.apache.spark.sql.hive.HiveContext

import org.apache.spark.sql.hive.HiveContext

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

warning: there was one deprecation warning; re-run with -deprecation for details sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@19e03398

scala> sqlContext.sql("show databases").show()

+------------+

|databaseName|

+------------+

| default| +

------------+

In Hive shell i am able to see all the databases:

0: jdbc:hive2://ip-10-0-10-76.amer.o9solution> show databases; INFO : Compiling command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225): show databases INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225); Time taken: 0.003 seconds INFO : Executing command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225); Time taken: 0.005 seconds INFO : OK

+---------------------+

| database_name |

+---------------------+

| default |

| information_schema |

| rh_ml |

| schema_7539 |

| sys |

+---------------------+

Any help to resolve this issue? Thank you in advance.

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Spark2 shell is not displaying all Hive databases. only listing default database

@subhash parise

From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation:

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_configure_a_s...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

9 REPLIES 9
Highlighted

Re: Spark2 shell is not displaying all Hive databases. only listing default database

@subhash parise

From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation:

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_configure_a_s...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Re: Spark2 shell is not displaying all Hive databases. only listing default database

New Contributor

Thanks for posting this answer, Subhash.

I thought I was going crazy, because I was using HDP 2.X sandbox before and that's all what one had to do to query HIVE-created tables from Spark:

%pyspark
from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
hive_context.sql('show tables').show()

Many posts out there say that not having ./conf/hive-site.xml may be the problem, but it DOES exist on the HDP 3.0.1 sandbox, while HiveContext still shows only Spark-created tables (both permanent and temporary).

So I have a follow-up question:

Is the value "thrift://sandbox-hdp.hortonworks.com:9083" correct for the property "hive.metastore.uris" in ./conf/hive-site.xml?

Here's the entry for this host from /etc/hosts:

172.18.0.2      sandbox-hdp.hortonworks.com sandbox-hdp

Is this the IP address of one of the hosts in the virtual cluster? (I am using HDP Sandbox 3.0.1)

Changing the host IP address to 172.0.0.1 (same as localhost) results in an error while trying to create and use HiveContext - connection refused, probably meaning that the Thrift server is NOT running on port number 9083 of the sandbox VM ???

Re: Spark2 shell is not displaying all Hive databases. only listing default database

Expert Contributor

Hi

I'm facing same issue. I have also added following properties in spark configuration. But still it's not working.

spark.sql.hive.hiveserver2.jdbc.url:jdbc:hive2://hive_server_FQDN:10000/

spark.datasource.hive.warehouse.metastoreUri: thrift://hive_server_FQDN:9083

spark.datasource.hive.warehouse.load.staging.dir: /tmp

spark.hadoop.hive.llap.daemon.service.hosts: @llap0

spark.hadoop.hive.zookeeper.quorum: hadoop3.test.com:2181

spark.sql.hive.hiveserver2.jdbc.url.principal: hive/_HOST@TEST.COM

Re: Spark2 shell is not displaying all Hive databases. only listing default database

Super Collaborator

Hi Vinay,

does the spark user is having sufficient privileges to access hive db and table?

Re: Spark2 shell is not displaying all Hive databases. only listing default database

Expert Contributor

Hi Subhash

I have already added spark user for access the all database by ranger and all HDFS storage path.

Re: Spark2 shell is not displaying all Hive databases. only listing default database

Super Collaborator

Hi VInay,

can you post spark code which is using to display the databases ?

Re: Spark2 shell is not displaying all Hive databases. only listing default database

Expert Contributor

Hi Subhash

below is code

from pyspark import SparkConf
from pyspark.sql import SparkSession, HiveContext
from pyspark.sql import functions as fn
from pyspark.sql.functions import rank,sum,col
from pyspark.sql import Window
sparkSession = (SparkSession
                .builder
                .master("local")
                .appName('sprk-job')
                .enableHiveSupport()
                .getOrCreate())
sparkSession.sql("show databases").show()
sparkSession.stop()

Even i'm also trying from spark-shell.

Re: Spark2 shell is not displaying all Hive databases. only listing default database

Super Collaborator

Hi Vinay,

use the below code to connect hive and list the databases :

spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hiveserverip:10000/" spark.datasource.hive.warehouse.load.staging.dir="/tmp" spark.hadoop.hive.zookeeper.quorum="zookeeperquoremip:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

hive.showDatabases().show(100, false)

Reference article

https://github.com/hortonworks-spark/spark-llap/tree/master

Re: Spark2 shell is not displaying all Hive databases. only listing default database

New Contributor

On Spark 2 and HDP 3.x . Edit file "/usr/hdp/3.1.4.0-315/spark2/conf/hive-site.xml"

. Remove property below:

"<property>
<name>metastore.catalog.default</name>
<value>spark</value>
</property>"

After i show all databases:

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@4e6881e

scala> sqlContext.sql("show databases").show();
Hive Session ID = edac02b0-c2f7-4cd9-919d-97bff977be3b
+------------------+
| databaseName|
+------------------+
| default|
|information_schema|
| sys|
| toy_store|
| website|
+------------------+