Support Questions
Find answers, ask questions, and share your expertise

spark-sql : Error in session initiation - NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning

Contributor

I am facing issue while initiating spark-sql session.

Initially when I initiated spark session only default database was visible (Not default database of Hive but same of Spark).

In order to view hive databases I copied hive-site.xml from hive-conf dir to spark-conf dir. After I copied hive-site.xml I am getting below error.

$ spark-sql
WARN HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
WARN HiveConf: HiveConf of name hive.metastore.warehouse.external.dir does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.use.ssl does not exist
WARN HiveConf: HiveConf of name hive.heapsize does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
WARN HiveConf: HiveConf of name hive.materializedview.rewriting.incremental does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.cors.allowed.headers does not exist
WARN HiveConf: HiveConf of name hive.driver.parallel.compilation does not exist
WARN HiveConf: HiveConf of name hive.tez.bucket.pruning does not exist
WARN HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist
WARN HiveConf: HiveConf of name hive.load.data.owner does not exist
WARN HiveConf: HiveConf of name hive.execution.mode does not exist
WARN HiveConf: HiveConf of name hive.service.metrics.codahale.reporter.classes does not exist
WARN HiveConf: HiveConf of name hive.strict.managed.tables does not exist
WARN HiveConf: HiveConf of name hive.create.as.insert.only does not exist
WARN HiveConf: HiveConf of name hive.optimize.dynamic.partition.hashjoin does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.enable.cors does not exist
WARN HiveConf: HiveConf of name hive.metastore.db.type does not exist
WARN HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist
WARN HiveConf: HiveConf of name hive.metastore.transactional.event.listeners does not exist
WARN HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist
INFO metastore: Trying to connect to metastore with URI thrift://<host-name>:9083
INFO metastore: Connected to metastore.
INFO SessionState: Created local directory: /tmp/7b9d5455-e71a-4bd5-aa4b-385758b575a8_resources
INFO SessionState: Created HDFS directory: /tmp/hive/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8
INFO SessionState: Created local directory: /tmp/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8
INFO SessionState: Created HDFS directory: /tmp/hive/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8/_tmp_space.db
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:133)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:904)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 13 more
INFO ShutdownHookManager: Shutdown hook called
INFO ShutdownHookManager: Deleting directory /tmp/spark-911cc8f5-f53b-4ae6-add3-0c745581bead
$

I am able to run pyspark and spark-shell session successfully and Hive databases are visible to me in pyspark/spark-shell session.

The error is related to tez and I confirmed that tez services are running fine. I am successfully able to access hive tables through hive2.

I am using HDP3.0 and for Hive execution engine is Tez (Map-Reduce has been removed).

1 ACCEPTED SOLUTION

Contributor

I copied /etc/hive/hive-site.xml from hive conf directory to /etc/spark2/ and then removed below properties from /etc/spark2/conf/hive-site.xml.

It's working now, I can see Hive databases in spark (pyspakr, spark-shell, spark-sql etc).

hive.tez.cartesian-product.enabled 
hive.metastore.warehouse.external.dir 
hive.server2.webui.use.ssl 
hive.heapsize 
hive.server2.webui.port 
hive.materializedview.rewriting.incremental 
hive.server2.webui.cors.allowed.headers 
hive.driver.parallel.compilation 
hive.tez.bucket.pruning 
hive.hook.proto.base-directory 
hive.load.data.owner 
hive.execution.mode 
hive.service.metrics.codahale.reporter.classes 
hive.strict.managed.tables 
hive.create.as.insert.only 
hive.optimize.dynamic.partition.hashjoin 
hive.server2.webui.enable.cors 
hive.metastore.db.type 
hive.txn.strict.locking.mode 
hive.metastore.transactional.event.listeners 
hive.tez.input.generate.consistent.splits 

Do you see any consequences?

View solution in original post

4 REPLIES 4

@Shantanu Sharma You should not copy the hive-site.xml from hive conf directory for spark. Spark uses a smaller and rather simple hive-site.xml.

cat /etc/spark2/conf/hive-site.xml

<configuration  xmlns:xi="http://www.w3.org/2001/XInclude">
    <property>
      <name>hive.exec.scratchdir</name>
      <value>/tmp/spark</value>
    </property>
    <property>
      <name>hive.metastore.client.connect.retry.delay</name>
      <value>5</value>
    </property>
    <property>
      <name>hive.metastore.client.socket.timeout</name>
      <value>1800</value>
    </property>
    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://hive-trhift-fqdn:9083</value>
    </property>
    <property>
      <name>hive.server2.enable.doAs</name>
      <value>false</value>
    </property>
    <property>
      <name>hive.server2.thrift.http.port</name>
      <value>10002</value>
    </property>
    <property>
      <name>hive.server2.thrift.port</name>
      <value>10016</value>
    </property>
    <property>
      <name>hive.server2.transport.mode</name>
      <value>binary</value>
    </property>
</configuration>

The above is an example, just make sure you change values accordingly to your env (hive.metastore.uris)

HTH

Contributor

@Felix Albani

If I don't copy hive-site.xml from hive conf directory for spark then I can't see Hive databases in spark(pyspark and spark-shell).

Could you please explain me what all properties I should add in hive-site.xml of saprk and where should I update hive.metastore.uris?

If I copy below property from hive conf to spark conf, will this work?

Technical Stack Details:

HDP3.0

Spark2.3

Hive3.1

<configuration>
  <property>
  <name>hive.metastore.uris</name>
    <!-- hostname must point to the Hive metastore URI in your cluster -->
    <value>thrift://hostname:9083</value>
    <description>URI for client to contact metastore server</description>
  </property>
</configuration>

Contributor

I copied /etc/hive/hive-site.xml from hive conf directory to /etc/spark2/ and then removed below properties from /etc/spark2/conf/hive-site.xml.

It's working now, I can see Hive databases in spark (pyspakr, spark-shell, spark-sql etc).

hive.tez.cartesian-product.enabled 
hive.metastore.warehouse.external.dir 
hive.server2.webui.use.ssl 
hive.heapsize 
hive.server2.webui.port 
hive.materializedview.rewriting.incremental 
hive.server2.webui.cors.allowed.headers 
hive.driver.parallel.compilation 
hive.tez.bucket.pruning 
hive.hook.proto.base-directory 
hive.load.data.owner 
hive.execution.mode 
hive.service.metrics.codahale.reporter.classes 
hive.strict.managed.tables 
hive.create.as.insert.only 
hive.optimize.dynamic.partition.hashjoin 
hive.server2.webui.enable.cors 
hive.metastore.db.type 
hive.txn.strict.locking.mode 
hive.metastore.transactional.event.listeners 
hive.tez.input.generate.consistent.splits 

Do you see any consequences?

Contributor

HDP 3.0 has different way of integrating Apache Hive with Apache Spark using Hive Warehouse Connector.

Below article explains the steps:

https://community.hortonworks.com/content/kbentry/223626/integrating-apache-hive-with-apache-spark-h...

; ;