Support Questions
Find answers, ask questions, and share your expertise

Spark unable to connect Hive database in HDP 3.0.1

Expert Contributor

Hi Folks,

Hope all are doing well.!!!

I've upgrade HDP 2.6.5 to HDP 3.0.1.0-187 successfully. now i'm trying to connecting hive datbases using spark-shell, i'm unable to see any hive databases. Even i have copied /etc/hive/conf/hive-site.xml to /etc/spark2/conf/ and restarted spark service. After restart spark service, hive-site.xml to original xml file.

Have there any alternative solution to resolve the issue?

Kindly assist me to fix the issue.

1 ACCEPTED SOLUTION

Super Collaborator

Hi Vinay,

use the below code to connect hive and list the databases :

spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hiveserverip:10000/" spark.datasource.hive.warehouse.load.staging.dir="/tmp" spark.hadoop.hive.zookeeper.quorum="zookeeperquoremip:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

hive.showDatabases().show(100, false)

Reference article

https://github.com/hortonworks-spark/spark-llap/tree/master

View solution in original post

33 REPLIES 33

Expert Contributor
@Geoffrey Shelton Okot

Ohh. I did not enable the pre-emption via yarn config, It is only point which is pending. Rest of part, i have completed.

let me check with enable yarn pre-emption. Will update you once done it.

Expert Contributor

@Geoffrey Shelton Okot

No luck. Pre-emption is already enabled via yarn config and all other prerequisite has completed. Hive interactive query service is running fine. Still

19/01/03 05:16:45 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=vinay@TEST.COM (auth:KERBEROS) retries=1 delay=5 lifetime=0 
19/01/03 05:16:47 INFO CodeGenerator: Code generated in 294.781928 ms
19/01/03 05:16:47 INFO CodeGenerator: Code generated in 18.011739 ms
+------------+ |databaseName| +
------------+ | default| +------------+

Mentor

@Vinay
So now the interactive query is running fine and it no longer throws errors, except that you can't see the other databases except the "DEFAULT" ?

IN HDP 3.0 spark uses its own separate catalog this should explain why can't see any hive databases. ToYou should use the HiveWarehouseConnector. work with hive databases please follow this documentation Configuring hiveWarehouseConnector

Please revert

HTH


Expert Contributor

@Geoffrey Shelton Okot

Yes interactive query is running fine.

i have edited below properties in custom spark2-default configuration:

spark.sql.hive.hiveserver2.jdbc.url.principal

spark.hadoop.hive.zookeeper.quorum

spark.hadoop.hive.llap.daemon.service.hosts

spark.datasource.hive.warehouse.load.staging.dir

spark.datasource.hive.warehouse.metastoreUri

spark.sql.hive.hiveserver2.jdbc.url

After taken restart.

run the spark-shell

sql("show databases").show()

still only DEFAULT database is visible.

Expert Contributor

Super Collaborator

Hi Vinay,

use the below code to connect hive and list the databases :

spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://hiveserverip:10000/" spark.datasource.hive.warehouse.load.staging.dir="/tmp" spark.hadoop.hive.zookeeper.quorum="zookeeperquoremip:2181" --jars /usr/hdp/current/hive_warehouse_connector/hive-warehouse-connector-assembly-1.0.0.3.0.0.0-1634.jar

val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(spark).build()

hive.showDatabases().show(100, false)

Reference article

https://github.com/hortonworks-spark/spark-llap/tree/master

Expert Contributor

Mentor

@Vinay

Nice it worked out but the solution wasn't far!

Expert Contributor

Almost, we had done. Thanks again @Geoffrey Shelton Okot

New Contributor

try changing "metastore.catalog.default" to "hive" instead of "spark" in spark settings to see all HIVE schemas.

New Contributor

Hi


I have follow all the above configurations and finally manage to figure out that the spark.hadoop.metastore.catalog.default set to spark. So if you change this to hive on the command line as listed below its showing all my hive metastore catalog tables.

  • spark-shell --conf spark.hadoop.metastore.catalog.default=hive

Thanks

Naga

New Contributor

Huge thanks. It works for me.

New Contributor

HI , I got bellow error when I develop HWC code on my local , Could you help me to have correct configuration when we work with spark local

Caused by: java.util.NoSuchElementException: spark.sql.hive.hiveserver2.jdbc.url

at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1571)

at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1571)


Code :

Dependencies which i am using in pom.xml

    <dependency>
        <groupId>com.hortonworks</groupId>
        <artifactId>spark-llap_2-11</artifactId>
        <version>1.0.2-2.1</version>
    </dependency>

    <!-- https://mvnrepository.com/artifact/com.hortonworks.hive/hive-warehouse-connector -->
    <dependency>
    <groupId>com.hortonworks.hive</groupId>
    <artifactId>hive-warehouse-connector_2.11</artifactId>
    <version>1.0.0.3.1.2.1-1</version>
</dependency>
val sparkConfig = new SparkConf()

sparkConfig.set("spark.broadcast.compress", "false")
sparkConfig.set("spark.shuffle.compress", "false")
sparkConfig.set("spark.shuffle.spill.compress", "false")
sparkConfig.set("spark.io.compression.codec", "lzf")
sparkConfig.set("spark.sql.catalogImplementation", "hive")
sparkConfig.set("hive.exec.dynamic.partition.mode","nonstrict")
sparkConfig.set("spark.default.parallelism","1")
sparkConfig.set("spark.shuffle.partitions","1")
sparkConfig.set("spark.sql.hive.llap", "true")
sparkConfig.set("spark.datasource.hive.warehouse.load.staging.dir","/tmp")
sparkConfig.set("spark.hadoop.hive.llap.daemon.service.hosts","@llap0")
  sparkConfig.set("spark.hadoop.hive.zookeeper.quorum ","host1:2181;host2:2181;host3:2181")
sparkConfig.set(" spark.hadoop.metastore.catalog.default","hive")

val _spark:SparkSession = SparkSession.builder
  .master("local")
  .appName("Unit Test")
  .config(sparkConfig)
  .enableHiveSupport()
  .getOrCreate()

println("Spark Session Initialized")
val hive = HiveWarehouseSession.session(_spark).build()
print(hive.showDatabases())
; ;