Support Questions

er_sharma_shant · ‎09-15-2018

I have installed Hortonworks hdp3.0 and configured Zeppelin as well.

When I running spark or sql Zeppelin only showing me default database(This is the default database from Spark which has location as '/apps/spark/warehouse', not the default database of Hive). This is probably because hive.metastore.warehouse.dir property is not set from hive-site.xml and zeppelin is picking this from Spark config (spark.sql.warehouse.dir).

I had similar issue with spark as well and it was due to hive-site.xml file on spark-conf dir, I was able to resolve this by copying hive-site.xml from hive-conf dir to spark-conf dir.

I did the same for Zeppelin as well, copied hive-site.xml in zeppelin dir(where it has zeppelin-site.xml and also copied in zeppelin-external-dependency-conf dir.

But this did not resolve the issue

*** Edit#1 - adding some additional information ***

I have create spark session by enabling hive support through enableHiveSupport(), and even tried setting spark.sql.warehouse.dir config property. but this did not help.

import org.apache.spark.sql.SparkSession

val spark =SparkSession.builder.appName("Test Zeppelin").config("spark.sql.warehouse.dir","/apps/hive/db").enableHiveSupport().getOrCreate()

Through some online help, I am learnt that Zeppelin uses only Spark's hive-site.xml file, but I can view all hive databases through spark it's only in Zeppelin (through spark2) I am not able to access Hive databases.

Additionaly Zeppelin is not letting me choose programming language, it by default creates session with scala. I would prefer a Zeppeling session with pyspark.

Any help on this will be highly appreciated

er_sharma_shant · ‎09-17-2018

After copying hive-site.xml from hive-conf dir to spark-conf dir, I restarted the spark services that reverted those changes, I copied hive-site.xml again and it's working now.


cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

airztz · ‎09-19-2018

I'm having the same issue, both spark and zeppelin are not able to read hive metastore

Your solution is not working, any idea?

spark@amb1:/root$ cp /etc/hive/conf/hive-site.xml /etc/spark2/conf

spark@amb1:/root$ spark-sql

SPARK_MAJOR_VERSION is set to 2, using Spark2

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.metastore.warehouse.external.dir does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.use.ssl does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.heapsize does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.materializedview.rewriting.incremental does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.cors.allowed.headers does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.driver.parallel.compilation does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.tez.bucket.pruning does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.load.data.owner does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.execution.mode does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.service.metrics.codahale.reporter.classes does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.strict.managed.tables does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.create.as.insert.only does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.optimize.dynamic.partition.hashjoin does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.enable.cors does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.metastore.db.type does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.metastore.transactional.event.listeners does not exist

18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist

18/09/18 15:15:49 INFO metastore: Trying to connect to metastore with URI thrift://host:9083

18/09/18 15:15:49 INFO metastore: Connected to metastore.

18/09/18 15:15:50 INFO SessionState: Created local directory: /tmp/6dfdc844-1cfc-4aa7-bb55-86df23ab989e_resources

18/09/18 15:15:50 INFO SessionState: Created HDFS directory: /tmp/hive/spark/6dfdc844-1cfc-4aa7-bb55-86df23ab989e

18/09/18 15:15:50 INFO SessionState: Created local directory: /tmp/spark/6dfdc844-1cfc-4aa7-bb55-86df23ab989e

18/09/18 15:15:50 INFO SessionState: Created HDFS directory: /tmp/hive/spark/6dfdc844-1cfc-4aa7-bb55-86df23ab989e/_tmp_space.db

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning

at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529)

at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:133)

at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:498)

at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)

at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)

at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)

at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)

at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)

at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning

at java.net.URLClassLoader.findClass(URLClassLoader.java:381)

at java.lang.ClassLoader.loadClass(ClassLoader.java:424)

at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)

at java.lang.ClassLoader.loadClass(ClassLoader.java:357)

... 13 more

18/09/18 15:15:51 INFO ShutdownHookManager: Shutdown hook called

18/09/18 15:15:51 INFO ShutdownHookManager: Deleting directory /tmp/spark-1521e135-c26e-4aed-b818-2c1512835709

airztz · ‎09-18-2018

I have the same issue

spark@amb1:/root$ hadoop fs -ls /apps/spark/warehouse

Found 1 items

drwxr-xr-x - hive hdfs0 2018-09-18 00:15 /apps/spark/warehouse/stock_etf_crypto.db

spark-sql

18/09/18 15:30:09 INFO HiveClientImpl: Warehouse location for Hive client (version 3.0.0) is /apps/spark/warehouse

18/09/18 15:30:10 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://amb1.megapro.com:9083

18/09/18 15:30:10 INFO HiveMetaStoreClient: Opened a connection to metastore, current connections: 1

18/09/18 15:30:10 INFO HiveMetaStoreClient: Connected to metastore.

18/09/18 15:30:10 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=spark (auth:SIMPLE) retries=1 delay=5 lifetime=0

18/09/18 15:30:11 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint

spark-sql> show databases;

18/09/18 15:30:39 INFO CodeGenerator: Code generated in 266.18818 ms

default

Time taken: 1.215 seconds, Fetched 1 row(s)

er_sharma_shant · ‎09-19-2018

@Tongzhou Zhou

Try this:

1. Ensure hive-site.xml in hive-conf dir and spark-conf is identical, below command should not return anything.

diff /etc/hive/conf/hive-site.xml /etc/spark2/conf/hive-site.xml

2. Invoke REPL spark session (pyspark or spark-shell)

$pyspark

3. Show hive databases

spark.sql("show databases")

Are you able to access hive tables now?

airztz · ‎09-19-2018

My friend from Hortonworks told me that in HDP 3.0 spark and hive are using their own catalog, which is not visible to each other. As a result we have to manage spark and hive databases separately.

er_sharma_shant · ‎10-11-2018

@Tongzhou Zhou

Sorry for delayed response.

After copying hive-site.xml from hive-conf dir to spark-conf dir, I am able to access Hive databases from pyspark and spark-shell, But I am also getting same error while initiating spark-sql session.

Did you find what is the best way to use hive databases within all Spark APIs (spark-sql, pyspark, spark-shell and spark-submit etc)?

er_sharma_shant · ‎10-15-2018

@Tongzhou Zhou

I copied /etc/hive/hive-site.xml from hive conf directory to /etc/spark2/ and then removed below properties from /etc/spark2/conf/hive-site.xml.

It's working now, I can see Hive databases in spark (pyspakr, spark-shell, spark-sql etc).

hive.tez.cartesian-product.enabled 
hive.metastore.warehouse.external.dir 
hive.server2.webui.use.ssl 
hive.heapsize 
hive.server2.webui.port 
hive.materializedview.rewriting.incremental 
hive.server2.webui.cors.allowed.headers 
hive.driver.parallel.compilation 
hive.tez.bucket.pruning 
hive.hook.proto.base-directory 
hive.load.data.owner 
hive.execution.mode 
hive.service.metrics.codahale.reporter.classes 
hive.strict.managed.tables 
hive.create.as.insert.only 
hive.optimize.dynamic.partition.hashjoin 
hive.server2.webui.enable.cors 
hive.metastore.db.type 
hive.txn.strict.locking.mode 
hive.metastore.transactional.event.listeners 
hive.tez.input.generate.consistent.splits

Can you please try this and let me know if you still face this issue?

cstefan · ‎10-29-2018

Hello community,

I am facing the same problem as @Shantanu Sharma, I am able to access Hive database from pyspark and spark-shell, but I am also getting the same error with spark-sql

Is there any update on this issue?

Thanks in advance

er_sharma_shant · ‎10-29-2018

@Christos Stefanopoulos

HDP 3.0 has different way of integrating Apache Hive with Apache Spark using Hive Warehouse Connector.

Below article explains the steps:

https://community.hortonworks.com/content/kbentry/223626/integrating-apache-hive-with-apache-spark-h...

Cloudera Community

Support Questions

Zeppelin : Not able to connect Hive Databases (through spark2) HDP3.0