- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Zeppelin : Not able to connect Hive Databases (through spark2) HDP3.0
Created 09-15-2018 08:53 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have installed Hortonworks hdp3.0 and configured Zeppelin as well.
When I running spark or sql Zeppelin only showing me default database(This is the default database from Spark which has location as '/apps/spark/warehouse', not the default database of Hive). This is probably because hive.metastore.warehouse.dir property is not set from hive-site.xml and zeppelin is picking this from Spark config (spark.sql.warehouse.dir).
I had similar issue with spark as well and it was due to hive-site.xml file on spark-conf dir, I was able to resolve this by copying hive-site.xml from hive-conf dir to spark-conf dir.
I did the same for Zeppelin as well, copied hive-site.xml in zeppelin dir(where it has zeppelin-site.xml and also copied in zeppelin-external-dependency-conf dir.
But this did not resolve the issue
*** Edit#1 - adding some additional information ***
I have create spark session by enabling hive support through enableHiveSupport(), and even tried setting spark.sql.warehouse.dir config property. but this did not help.
import org.apache.spark.sql.SparkSession val spark =SparkSession.builder.appName("Test Zeppelin").config("spark.sql.warehouse.dir","/apps/hive/db").enableHiveSupport().getOrCreate()
Through some online help, I am learnt that Zeppelin uses only Spark's hive-site.xml file, but I can view all hive databases through spark it's only in Zeppelin (through spark2) I am not able to access Hive databases.
Additionaly Zeppelin is not letting me choose programming language, it by default creates session with scala. I would prefer a Zeppeling session with pyspark.
Any help on this will be highly appreciated
Created 09-17-2018 02:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
After copying hive-site.xml from hive-conf dir to spark-conf dir, I restarted the spark services that reverted those changes, I copied hive-site.xml again and it's working now.
cp /etc/hive/conf/hive-site.xml /etc/spark2/conf
Created 09-19-2018 11:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm having the same issue, both spark and zeppelin are not able to read hive metastore
Your solution is not working, any idea?
spark@amb1:/root$ cp /etc/hive/conf/hive-site.xml /etc/spark2/conf
spark@amb1:/root$ spark-sql
SPARK_MAJOR_VERSION is set to 2, using Spark2
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.metastore.warehouse.external.dir does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.use.ssl does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.heapsize does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.materializedview.rewriting.incremental does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.cors.allowed.headers does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.driver.parallel.compilation does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.tez.bucket.pruning does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.load.data.owner does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.execution.mode does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.service.metrics.codahale.reporter.classes does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.strict.managed.tables does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.create.as.insert.only does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.optimize.dynamic.partition.hashjoin does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.server2.webui.enable.cors does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.metastore.db.type does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.metastore.transactional.event.listeners does not exist
18/09/18 15:15:49 WARN HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist
18/09/18 15:15:49 INFO metastore: Trying to connect to metastore with URI thrift://host:9083
18/09/18 15:15:49 INFO metastore: Connected to metastore.
18/09/18 15:15:50 INFO SessionState: Created local directory: /tmp/6dfdc844-1cfc-4aa7-bb55-86df23ab989e_resources
18/09/18 15:15:50 INFO SessionState: Created HDFS directory: /tmp/hive/spark/6dfdc844-1cfc-4aa7-bb55-86df23ab989e
18/09/18 15:15:50 INFO SessionState: Created local directory: /tmp/spark/6dfdc844-1cfc-4aa7-bb55-86df23ab989e
18/09/18 15:15:50 INFO SessionState: Created HDFS directory: /tmp/hive/spark/6dfdc844-1cfc-4aa7-bb55-86df23ab989e/_tmp_space.db
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:133)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 13 more
18/09/18 15:15:51 INFO ShutdownHookManager: Shutdown hook called
18/09/18 15:15:51 INFO ShutdownHookManager: Deleting directory /tmp/spark-1521e135-c26e-4aed-b818-2c1512835709
Created 09-18-2018 08:44 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have the same issue
spark@amb1:/root$ hadoop fs -ls /apps/spark/warehouse
Found 1 items
drwxr-xr-x - hive hdfs0 2018-09-18 00:15 /apps/spark/warehouse/stock_etf_crypto.db
spark-sql
18/09/18 15:30:09 INFO HiveClientImpl: Warehouse location for Hive client (version 3.0.0) is /apps/spark/warehouse
18/09/18 15:30:10 INFO HiveMetaStoreClient: Trying to connect to metastore with URI thrift://amb1.megapro.com:9083
18/09/18 15:30:10 INFO HiveMetaStoreClient: Opened a connection to metastore, current connections: 1
18/09/18 15:30:10 INFO HiveMetaStoreClient: Connected to metastore.
18/09/18 15:30:10 INFO RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=spark (auth:SIMPLE) retries=1 delay=5 lifetime=0
18/09/18 15:30:11 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint
spark-sql> show databases;
18/09/18 15:30:39 INFO CodeGenerator: Code generated in 266.18818 ms
default
Time taken: 1.215 seconds, Fetched 1 row(s)
Created 09-19-2018 01:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Try this:
1. Ensure hive-site.xml in hive-conf dir and spark-conf is identical, below command should not return anything.
diff /etc/hive/conf/hive-site.xml /etc/spark2/conf/hive-site.xml
2. Invoke REPL spark session (pyspark or spark-shell)
$pyspark
3. Show hive databases
spark.sql("show databases")
Are you able to access hive tables now?
Created 09-19-2018 09:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
My friend from Hortonworks told me that in HDP 3.0 spark and hive are using their own catalog, which is not visible to each other. As a result we have to manage spark and hive databases separately.
Created 10-11-2018 09:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for delayed response.
After copying hive-site.xml from hive-conf dir to spark-conf dir, I am able to access Hive databases from pyspark and spark-shell, But I am also getting same error while initiating spark-sql session.
Did you find what is the best way to use hive databases within all Spark APIs (spark-sql, pyspark, spark-shell and spark-submit etc)?
Created 10-15-2018 09:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I copied /etc/hive/hive-site.xml from hive conf directory to /etc/spark2/ and then removed below properties from /etc/spark2/conf/hive-site.xml.
It's working now, I can see Hive databases in spark (pyspakr, spark-shell, spark-sql etc).
hive.tez.cartesian-product.enabled hive.metastore.warehouse.external.dir hive.server2.webui.use.ssl hive.heapsize hive.server2.webui.port hive.materializedview.rewriting.incremental hive.server2.webui.cors.allowed.headers hive.driver.parallel.compilation hive.tez.bucket.pruning hive.hook.proto.base-directory hive.load.data.owner hive.execution.mode hive.service.metrics.codahale.reporter.classes hive.strict.managed.tables hive.create.as.insert.only hive.optimize.dynamic.partition.hashjoin hive.server2.webui.enable.cors hive.metastore.db.type hive.txn.strict.locking.mode hive.metastore.transactional.event.listeners hive.tez.input.generate.consistent.splits
Can you please try this and let me know if you still face this issue?
Created 10-29-2018 02:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello community,
I am facing the same problem as @Shantanu Sharma, I am able to access Hive database from pyspark and spark-shell, but I am also getting the same error with spark-sql
Is there any update on this issue?
Thanks in advance
Created 10-29-2018 03:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
HDP 3.0 has different way of integrating Apache Hive with Apache Spark using Hive Warehouse Connector.
Below article explains the steps: