Support Questions

Find answers, ask questions, and share your expertise

Spark2 shell is not displaying all Hive databases. only listing default database

avatar
Super Collaborator

I am using HDP 3.0 with Hive LLAP.

I have pasted the code and output below:

scala> import org.apache.spark.sql.hive.HiveContext

import org.apache.spark.sql.hive.HiveContext

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

warning: there was one deprecation warning; re-run with -deprecation for details sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@19e03398

scala> sqlContext.sql("show databases").show()

+------------+

|databaseName|

+------------+

| default| +

------------+

In Hive shell i am able to see all the databases:

0: jdbc:hive2://ip-10-0-10-76.amer.o9solution> show databases; INFO : Compiling command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225): show databases INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225); Time taken: 0.003 seconds INFO : Executing command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225); Time taken: 0.005 seconds INFO : OK

+---------------------+

| database_name |

+---------------------+

| default |

| information_schema |

| rh_ml |

| schema_7539 |

| sys |

+---------------------+

Any help to resolve this issue? Thank you in advance.

1 ACCEPTED SOLUTION

avatar

@subhash parise

From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation:

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_configure_a_s...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

10 REPLIES 10

avatar

@subhash parise

From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation:

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_configure_a_s...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

avatar

Thanks for posting this answer, Subhash.

I thought I was going crazy, because I was using HDP 2.X sandbox before and that's all what one had to do to query HIVE-created tables from Spark:

%pyspark
from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
hive_context.sql('show tables').show()

Many posts out there say that not having ./conf/hive-site.xml may be the problem, but it DOES exist on the HDP 3.0.1 sandbox, while HiveContext still shows only Spark-created tables (both permanent and temporary).

So I have a follow-up question:

Is the value "thrift://sandbox-hdp.hortonworks.com:9083" correct for the property "hive.metastore.uris" in ./conf/hive-site.xml?

Here's the entry for this host from /etc/hosts:

172.18.0.2      sandbox-hdp.hortonworks.com sandbox-hdp

Is this the IP address of one of the hosts in the virtual cluster? (I am using HDP Sandbox 3.0.1)

Changing the host IP address to 172.0.0.1 (same as localhost) results in an error while trying to create and use HiveContext - connection refused, probably meaning that the Thrift server is NOT running on port number 9083 of the sandbox VM ???

avatar
Rising Star

Hi

I'm facing same issue. I have also added following properties in spark configuration. But still it's not working.

spark.sql.hive.hiveserver2.jdbc.url:jdbc:hive2://hive_server_FQDN:10000/

spark.datasource.hive.warehouse.metastoreUri: thrift://hive_server_FQDN:9083

spark.datasource.hive.warehouse.load.staging.dir: /tmp

spark.hadoop.hive.llap.daemon.service.hosts: @llap0

spark.hadoop.hive.zookeeper.quorum: hadoop3.test.com:2181

spark.sql.hive.hiveserver2.jdbc.url.principal: hive/_HOST@TEST.COM

avatar
Super Collaborator

Hi Vinay,

does the spark user is having sufficient privileges to access hive db and table?

avatar
Rising Star

Hi Subhash

I have already added spark user for access the all database by ranger and all HDFS storage path.

avatar
Super Collaborator

Hi VInay,

can you post spark code which is using to display the databases ?

avatar
Rising Star

Hi Subhash

below is code

from pyspark import SparkConf
from pyspark.sql import SparkSession, HiveContext
from pyspark.sql import functions as fn
from pyspark.sql.functions import rank,sum,col
from pyspark.sql import Window
sparkSession = (SparkSession
                .builder
                .master("local")
                .appName('sprk-job')
                .enableHiveSupport()
                .getOrCreate())
sparkSession.sql("show databases").show()
sparkSession.stop()

Even i'm also trying from spark-shell.

avatar
Super Collaborator

Hi Vinay,

use the below code to connect hive and list the databases :

spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hiveserverip:10000/" spark.datasource.hive.warehouse.load.staging.dir="/tmp" spark.hadoop.hive.zookeeper.quorum="zookeeperquoremip:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

hive.showDatabases().show(100, false)

Reference article

https://github.com/hortonworks-spark/spark-llap/tree/master

avatar
New Contributor

On Spark 2 and HDP 3.x . Edit file "/usr/hdp/3.1.4.0-315/spark2/conf/hive-site.xml"

. Remove property below:

"<property>
<name>metastore.catalog.default</name>
<value>spark</value>
</property>"

After i show all databases:

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@4e6881e

scala> sqlContext.sql("show databases").show();
Hive Session ID = edac02b0-c2f7-4cd9-919d-97bff977be3b
+------------------+
| databaseName|
+------------------+
| default|
|information_schema|
| sys|
| toy_store|
| website|
+------------------+