Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Zeppelin : Not able to connect Hive Databases (through spark2) HDP3.0

avatar
Contributor

I have installed Hortonworks hdp3.0 and configured Zeppelin as well.

When I running spark or sql Zeppelin only showing me default database(This is the default database from Spark which has location as '/apps/spark/warehouse', not the default database of Hive). This is probably because hive.metastore.warehouse.dir property is not set from hive-site.xml and zeppelin is picking this from Spark config (spark.sql.warehouse.dir).

I had similar issue with spark as well and it was due to hive-site.xml file on spark-conf dir, I was able to resolve this by copying hive-site.xml from hive-conf dir to spark-conf dir.

I did the same for Zeppelin as well, copied hive-site.xml in zeppelin dir(where it has zeppelin-site.xml and also copied in zeppelin-external-dependency-conf dir.

But this did not resolve the issue

*** Edit#1 - adding some additional information ***

I have create spark session by enabling hive support through enableHiveSupport(), and even tried setting spark.sql.warehouse.dir config property. but this did not help.

import org.apache.spark.sql.SparkSession

val spark =SparkSession.builder.appName("Test Zeppelin").config("spark.sql.warehouse.dir","/apps/hive/db").enableHiveSupport().getOrCreate()

Through some online help, I am learnt that Zeppelin uses only Spark's hive-site.xml file, but I can view all hive databases through spark it's only in Zeppelin (through spark2) I am not able to access Hive databases.

Additionaly Zeppelin is not letting me choose programming language, it by default creates session with scala. I would prefer a Zeppeling session with pyspark.

Any help on this will be highly appreciated

10 REPLIES 10

avatar
Contributor

After copying hive-site.xml from hive-conf dir to spark-conf dir, I restarted the spark services that reverted those changes, I copied hive-site.xml again and it's working now.


cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

avatar
New Contributor

I'm having the same issue, both spark and zeppelin are not able to read hive metastore

Your solution is not working, any idea?

spark@amb1:/root$ cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

spark@amb1:/root$ spark-sql

SPARK_MAJOR_VERSION is set to 2, using Spark2

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.metastore.warehouse.external.dir does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.use.ssl does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.heapsize does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.materializedview.rewriting.incremental does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.cors.allowed.headers does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.driver.parallel.compilation does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.tez.bucket.pruning does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.load.data.owner does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.execution.mode does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.service.metrics.codahale.reporter.classes does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.strict.managed.tables does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.create.as.insert.only does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.optimize.dynamic.partition.hashjoin does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.enable.cors does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.metastore.db.type does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.metastore.transactional.event.listeners does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist

18/09/18 15:15:49 INFO metastore: Trying to connect to metastore with URI thrift://host:9083

18/09/18 15:15:49 INFO metastore: Connected to metastore.

18/09/18 15:15:50 INFO SessionState: Created local directory: /tmp/6dfdc844-1cfc-4aa7-bb55-86df23ab989e_resources

18/09/18 15:15:50 INFO SessionState: Created HDFS directory: /tmp/hive/spark/6dfdc844-1cfc-4aa7-bb55-86df23ab989e

18/09/18 15:15:50 INFO SessionState: Created local directory: /tmp/spark/6dfdc844-1cfc-4aa7-bb55-86df23ab989e

18/09/18 15:15:50 INFO SessionState: Created HDFS directory: /tmp/hive/spark/6dfdc844-1cfc-4aa7-bb55-86df23ab989e/_tmp_space.db

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning

at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529)

at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:133)

at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

... 13 more

18/09/18 15:15:51 INFO ShutdownHookManager: Shutdown hook called

18/09/18 15:15:51 INFO ShutdownHookManager: Deleting directory /tmp/spark-1521e135-c26e-4aed-b818-2c1512835709

avatar
New Contributor

I have the same issue

spark@amb1:/root$ hadoop fs -ls /apps/spark/warehouse

Found 1 items

drwxr-xr-x - hive hdfs0 2018-09-18 00:15 /apps/spark/warehouse/stock_etf_crypto.db

spark-sql

18/09/18 15:30:09 INFO HiveClientImpl: Warehouse location for Hive client (version 3.0.0) is /apps/spark/warehouse

18/09/18 15:30:10 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://amb1.megapro.com:9083

18/09/18 15:30:10 INFO HiveMetaStoreClient: Opened a connection to metastore, current connections: 1

18/09/18 15:30:10 INFO HiveMetaStoreClient: Connected to metastore.

18/09/18 15:30:10 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=spark (auth:SIMPLE) retries=1 delay=5 lifetime=0

18/09/18 15:30:11 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint

spark-sql> show databases;

18/09/18 15:30:39 INFO CodeGenerator: Code generated in 266.18818 ms

default

Time taken: 1.215 seconds, Fetched 1 row(s)

avatar
Contributor

@Tongzhou Zhou

Try this:

1. Ensure hive-site.xml in hive-conf dir and spark-conf is identical, below command should not return anything.

diff /etc/hive/conf/hive-site.xml /etc/spark2/conf/hive-site.xml

2. Invoke REPL spark session (pyspark or spark-shell)

$pyspark

3. Show hive databases

spark.sql("show databases")

Are you able to access hive tables now?

avatar
New Contributor

My friend from Hortonworks told me that in HDP 3.0 spark and hive are using their own catalog, which is not visible to each other. As a result we have to manage spark and hive databases separately.

avatar
Contributor

@Tongzhou Zhou


Sorry for delayed response.

After copying hive-site.xml from hive-conf dir to spark-conf dir, I am able to access Hive databases from pyspark and spark-shell, But I am also getting same error while initiating spark-sql session.

Did you find what is the best way to use hive databases within all Spark APIs (spark-sql, pyspark, spark-shell and spark-submit etc)?

avatar
Contributor

@Tongzhou Zhou

I copied /etc/hive/hive-site.xml from hive conf directory to /etc/spark2/ and then removed below properties from /etc/spark2/conf/hive-site.xml.

It's working now, I can see Hive databases in spark (pyspakr, spark-shell, spark-sql etc).

hive.tez.cartesian-product.enabled 
hive.metastore.warehouse.external.dir 
hive.server2.webui.use.ssl 
hive.heapsize 
hive.server2.webui.port 
hive.materializedview.rewriting.incremental 
hive.server2.webui.cors.allowed.headers 
hive.driver.parallel.compilation 
hive.tez.bucket.pruning 
hive.hook.proto.base-directory 
hive.load.data.owner 
hive.execution.mode 
hive.service.metrics.codahale.reporter.classes 
hive.strict.managed.tables 
hive.create.as.insert.only 
hive.optimize.dynamic.partition.hashjoin 
hive.server2.webui.enable.cors 
hive.metastore.db.type 
hive.txn.strict.locking.mode 
hive.metastore.transactional.event.listeners 
hive.tez.input.generate.consistent.splits 

Can you please try this and let me know if you still face this issue?

avatar
Explorer

Hello community,

I am facing the same problem as @Shantanu Sharma, I am able to access Hive database from pyspark and spark-shell, but I am also getting the same error with spark-sql

Is there any update on this issue?

Thanks in advance

avatar
Contributor

@Christos Stefanopoulos

HDP 3.0 has different way of integrating Apache Hive with Apache Spark using Hive Warehouse Connector.

Below article explains the steps:

https://community.hortonworks.com/content/kbentry/223626/integrating-apache-hive-with-apache-spark-h...