Support Questions

subhash_parise3 · ‎09-24-2018

I am using HDP 3.0 with Hive LLAP.

I have pasted the code and output below:

scala> import org.apache.spark.sql.hive.HiveContext

import org.apache.spark.sql.hive.HiveContext

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

warning: there was one deprecation warning; re-run with -deprecation for details sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@19e03398

scala> sqlContext.sql("show databases").show()

+------------+

|databaseName|

+------------+

| default| +

------------+

In Hive shell i am able to see all the databases:

0: jdbc:hive2://ip-10-0-10-76.amer.o9solution> show databases; INFO : Compiling command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225): show databases INFO : Semantic Analysis Completed (retrial = false) INFO : Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:database_name, type:string, comment:from deserializer)], properties:null) INFO : Completed compiling command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225); Time taken: 0.003 seconds INFO : Executing command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225): show databases INFO : Starting task [Stage-0:DDL] in serial mode INFO : Completed executing command(queryId=hive_20180924093400_b66c3d0c-8e76-4a16-aed7-771fcae43225); Time taken: 0.005 seconds INFO : OK

+---------------------+

| database_name |

+---------------------+

| default |

| information_schema |

| rh_ml |

| schema_7539 |

| sys |

+---------------------+

Any help to resolve this issue? Thank you in advance.

falbani · ‎09-24-2018

@subhash parise

From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation:

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_configure_a_s...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

View solution in original post

falbani · ‎09-24-2018

@subhash parise

From HDP 3.0 onwards spark has its own separate catalog. This is reason why you don't see any hive databases. To work with hive databases you should use the HiveWarehouseConnector. Link to documentation:

https://docs.hortonworks.com/HDPDocuments/HDP3/HDP-3.0.0/integrating-hive/content/hive_configure_a_s...

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

gene_mishchenko · ‎05-06-2019

Thanks for posting this answer, Subhash.

I thought I was going crazy, because I was using HDP 2.X sandbox before and that's all what one had to do to query HIVE-created tables from Spark:

%pyspark
from pyspark.sql import HiveContext
hive_context = HiveContext(sc)
hive_context.sql('show tables').show()

Many posts out there say that not having ./conf/hive-site.xml may be the problem, but it DOES exist on the HDP 3.0.1 sandbox, while HiveContext still shows only Spark-created tables (both permanent and temporary).

So I have a follow-up question:

Is the value "thrift://sandbox-hdp.hortonworks.com:9083" correct for the property "hive.metastore.uris" in ./conf/hive-site.xml?

Here's the entry for this host from /etc/hosts:

172.18.0.2      sandbox-hdp.hortonworks.com sandbox-hdp

Is this the IP address of one of the hosts in the virtual cluster? (I am using HDP Sandbox 3.0.1)

Changing the host IP address to 172.0.0.1 (same as localhost) results in an error while trying to create and use HiveContext - connection refused, probably meaning that the Thrift server is NOT running on port number 9083 of the sandbox VM ???

vinayk · ‎12-31-2018

Hi

I'm facing same issue. I have also added following properties in spark configuration. But still it's not working.

spark.sql.hive.hiveserver2.jdbc.url:jdbc:hive2://hive_server_FQDN:10000/

spark.datasource.hive.warehouse.metastoreUri: thrift://hive_server_FQDN:9083

spark.datasource.hive.warehouse.load.staging.dir: /tmp

spark.hadoop.hive.llap.daemon.service.hosts: @llap0

spark.hadoop.hive.zookeeper.quorum: hadoop3.test.com:2181

spark.sql.hive.hiveserver2.jdbc.url.principal: hive/_HOST@TEST.COM

subhash_parise3 · ‎12-31-2018

Hi Vinay,

does the spark user is having sufficient privileges to access hive db and table?

vinayk · ‎01-02-2019

Hi Subhash

I have already added spark user for access the all database by ranger and all HDFS storage path.

subhash_parise3 · ‎01-02-2019

Hi VInay,

can you post spark code which is using to display the databases ?

vinayk · ‎01-02-2019

Hi Subhash

below is code

from pyspark import SparkConf
from pyspark.sql import SparkSession, HiveContext
from pyspark.sql import functions as fn
from pyspark.sql.functions import rank,sum,col
from pyspark.sql import Window
sparkSession = (SparkSession
                .builder
                .master("local")
                .appName('sprk-job')
                .enableHiveSupport()
                .getOrCreate())
sparkSession.sql("show databases").show()
sparkSession.stop()

Even i'm also trying from spark-shell.

subhash_parise3 · ‎01-07-2019

Hi Vinay,

use the below code to connect hive and list the databases :

spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hiveserverip:10000/" spark.datasource.hive.warehouse.load.staging.dir="/tmp" spark.hadoop.hive.zookeeper.quorum="zookeeperquoremip:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

hive.showDatabases().show(100, false)

Reference article

https://github.com/hortonworks-spark/spark-llap/tree/master

PhuNHB · ‎10-11-2019

On Spark 2 and HDP 3.x . Edit file "/usr/hdp/3.1.4.0-315/spark2/conf/hive-site.xml"

. Remove property below:

"<property>
<name>metastore.catalog.default</name>
<value>spark</value>
</property>"

After i show all databases:

scala> val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
warning: there was one deprecation warning; re-run with -deprecation for details
sqlContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@4e6881e

Cloudera Community

Support Questions

Spark2 shell is not displaying all Hive databases. only listing default database