- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Spark2 shell is not displaying all Hive databases. only listing default database
- Labels:
-
Apache Hive
-
Apache Spark
Created ‎09-24-2018 09:37 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I am using HDP 3.0 with Hive LLAP.
I have pasted the code and output below:
scala> import org.apache.spark.sql.hive.HiveContext
import org.apache.spark.sql.hive.HiveContext
scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
warning: there was one deprecation warning; re-run with -deprecation for details sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@19e03398
scala> sqlContext.sql("show databases").show()
+------------+
|databaseName|
+------------+
| default| +
------------+
In Hive shell i am able to see all the databases:
0: jdbc:hive2://ip-10-0-10-76.amer.o9solution> show databases; INFO : Compiling command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225): show databases INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225); Time taken: 0.003 seconds INFO : Executing command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225); Time taken: 0.005 seconds INFO : OK
+---------------------+
| database_name |
+---------------------+
| default |
| information_schema |
| rh_ml |
| schema_7539 |
| sys |
+---------------------+
Any help to resolve this issue? Thank you in advance.
Created ‎09-24-2018 12:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation:
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created ‎09-24-2018 12:25 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation:
HTH
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Created ‎05-06-2019 09:27 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for posting this answer, Subhash.
I thought I was going crazy, because I was using HDP 2.X sandbox before and that's all what one had to do to query HIVE-created tables from Spark:
%pyspark from pyspark.sql import HiveContext hive_context = HiveContext(sc) hive_context.sql('show tables').show()
Many posts out there say that not having ./conf/hive-site.xml may be the problem, but it DOES exist on the HDP 3.0.1 sandbox, while HiveContext still shows only Spark-created tables (both permanent and temporary).
So I have a follow-up question:
Is the value "thrift://sandbox-hdp.hortonworks.com:9083" correct for the property "hive.metastore.uris" in ./conf/hive-site.xml?
Here's the entry for this host from /etc/hosts:
172.18.0.2 sandbox-hdp.hortonworks.com sandbox-hdp
Is this the IP address of one of the hosts in the virtual cluster? (I am using HDP Sandbox 3.0.1)
Changing the host IP address to 172.0.0.1 (same as localhost) results in an error while trying to create and use HiveContext - connection refused, probably meaning that the Thrift server is NOT running on port number 9083 of the sandbox VM ???
Created ‎12-31-2018 09:13 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
I'm facing same issue. I have also added following properties in spark configuration. But still it's not working.
spark.sql.hive.hiveserver2.jdbc.url:jdbc:hive2://hive_server_FQDN:10000/
spark.datasource.hive.warehouse.metastoreUri: thrift://hive_server_FQDN:9083
spark.datasource.hive.warehouse.load.staging.dir: /tmp
spark.hadoop.hive.llap.daemon.service.hosts: @llap0
spark.hadoop.hive.zookeeper.quorum: hadoop3.test.com:2181
spark.sql.hive.hiveserver2.jdbc.url.principal: hive/_HOST@TEST.COM
Created ‎12-31-2018 09:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vinay,
does the spark user is having sufficient privileges to access hive db and table?
Created ‎01-02-2019 12:09 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Subhash
I have already added spark user for access the all database by ranger and all HDFS storage path.
Created ‎01-02-2019 12:21 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi VInay,
can you post spark code which is using to display the databases ?
Created ‎01-02-2019 12:54 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Subhash
below is code
from pyspark import SparkConf from pyspark.sql import SparkSession, HiveContext from pyspark.sql import functions as fn from pyspark.sql.functions import rank,sum,col from pyspark.sql import Window sparkSession = (SparkSession .builder .master("local") .appName('sprk-job') .enableHiveSupport() .getOrCreate()) sparkSession.sql("show databases").show() sparkSession.stop()
Even i'm also trying from spark-shell.
Created ‎01-07-2019 06:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Vinay,
use the below code to connect hive and list the databases :
spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hiveserverip:10000/" spark.datasource.hive.warehouse.load.staging.dir="/tmp" spark.hadoop.hive.zookeeper.quorum="zookeeperquoremip:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar
val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()
hive.showDatabases().show(100, false)
Reference article
Created on ‎10-11-2019 03:03 AM - edited ‎10-11-2019 03:06 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
On Spark 2 and HDP 3.x . Edit file "/usr/hdp/3.1.4.0-315/spark2/conf/hive-site.xml"
. Remove property below:
"<property>
<name>metastore.catalog.default</name>
<value>spark</value>
</property>"
After i show all databases:
scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@4e6881e
scala> sqlContext.sql("show databases").show();
Hive Session ID = edac02b0-c2f7-4cd9-919d-97bff977be3b
+------------------+
| databaseName|
+------------------+
| default|
|information_schema|
| sys|
| toy_store|
| website|
+------------------+
