Support Questions

er_sharma_shant · ‎10-11-2018

I am facing issue while initiating spark-sql session.

Initially when I initiated spark session only default database was visible (Not default database of Hive but same of Spark).

In order to view hive databases I copied hive-site.xml from hive-conf dir to spark-conf dir. After I copied hive-site.xml I am getting below error.

$ spark-sql
WARN HiveConf: HiveConf of name hive.tez.cartesian-product.enabled does not exist
WARN HiveConf: HiveConf of name hive.metastore.warehouse.external.dir does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.use.ssl does not exist
WARN HiveConf: HiveConf of name hive.heapsize does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.port does not exist
WARN HiveConf: HiveConf of name hive.materializedview.rewriting.incremental does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.cors.allowed.headers does not exist
WARN HiveConf: HiveConf of name hive.driver.parallel.compilation does not exist
WARN HiveConf: HiveConf of name hive.tez.bucket.pruning does not exist
WARN HiveConf: HiveConf of name hive.hook.proto.base-directory does not exist
WARN HiveConf: HiveConf of name hive.load.data.owner does not exist
WARN HiveConf: HiveConf of name hive.execution.mode does not exist
WARN HiveConf: HiveConf of name hive.service.metrics.codahale.reporter.classes does not exist
WARN HiveConf: HiveConf of name hive.strict.managed.tables does not exist
WARN HiveConf: HiveConf of name hive.create.as.insert.only does not exist
WARN HiveConf: HiveConf of name hive.optimize.dynamic.partition.hashjoin does not exist
WARN HiveConf: HiveConf of name hive.server2.webui.enable.cors does not exist
WARN HiveConf: HiveConf of name hive.metastore.db.type does not exist
WARN HiveConf: HiveConf of name hive.txn.strict.locking.mode does not exist
WARN HiveConf: HiveConf of name hive.metastore.transactional.event.listeners does not exist
WARN HiveConf: HiveConf of name hive.tez.input.generate.consistent.splits does not exist
INFO metastore: Trying to connect to metastore with URI thrift://<host-name>:9083
INFO metastore: Connected to metastore.
INFO SessionState: Created local directory: /tmp/7b9d5455-e71a-4bd5-aa4b-385758b575a8_resources
INFO SessionState: Created HDFS directory: /tmp/hive/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8
INFO SessionState: Created local directory: /tmp/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8
INFO SessionState: Created HDFS directory: /tmp/hive/spark/7b9d5455-e71a-4bd5-aa4b-385758b575a8/_tmp_space.db
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:133)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:904)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.tez.dag.api.SessionNotRunning
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 13 more
INFO ShutdownHookManager: Shutdown hook called
INFO ShutdownHookManager: Deleting directory /tmp/spark-911cc8f5-f53b-4ae6-add3-0c745581bead
$

I am able to run pyspark and spark-shell session successfully and Hive databases are visible to me in pyspark/spark-shell session.

The error is related to tez and I confirmed that tez services are running fine. I am successfully able to access hive tables through hive2.

I am using HDP3.0 and for Hive execution engine is Tez (Map-Reduce has been removed).

er_sharma_shant · ‎10-15-2018

I copied /etc/hive/hive-site.xml from hive conf directory to /etc/spark2/ and then removed below properties from /etc/spark2/conf/hive-site.xml.

It's working now, I can see Hive databases in spark (pyspakr, spark-shell, spark-sql etc).

hive.tez.cartesian-product.enabled 
hive.metastore.warehouse.external.dir 
hive.server2.webui.use.ssl 
hive.heapsize 
hive.server2.webui.port 
hive.materializedview.rewriting.incremental 
hive.server2.webui.cors.allowed.headers 
hive.driver.parallel.compilation 
hive.tez.bucket.pruning 
hive.hook.proto.base-directory 
hive.load.data.owner 
hive.execution.mode 
hive.service.metrics.codahale.reporter.classes 
hive.strict.managed.tables 
hive.create.as.insert.only 
hive.optimize.dynamic.partition.hashjoin 
hive.server2.webui.enable.cors 
hive.metastore.db.type 
hive.txn.strict.locking.mode 
hive.metastore.transactional.event.listeners 
hive.tez.input.generate.consistent.splits

Do you see any consequences?

View solution in original post

falbani · ‎10-12-2018

@Shantanu Sharma You should not copy the hive-site.xml from hive conf directory for spark. Spark uses a smaller and rather simple hive-site.xml.

cat /etc/spark2/conf/hive-site.xml

<configuration  xmlns:xi="http://www.w3.org/2001/XInclude">
    <property>
      <name>hive.exec.scratchdir</name>
      <value>/tmp/spark</value>
    </property>
    <property>
      <name>hive.metastore.client.connect.retry.delay</name>
      <value>5</value>
    </property>
    <property>
      <name>hive.metastore.client.socket.timeout</name>
      <value>1800</value>
    </property>
    <property>
      <name>hive.metastore.uris</name>
      <value>thrift://hive-trhift-fqdn:9083</value>
    </property>
    <property>
      <name>hive.server2.enable.doAs</name>
      <value>false</value>
    </property>
    <property>
      <name>hive.server2.thrift.http.port</name>
      <value>10002</value>
    </property>
    <property>
      <name>hive.server2.thrift.port</name>
      <value>10016</value>
    </property>
    <property>
      <name>hive.server2.transport.mode</name>
      <value>binary</value>
    </property>
</configuration>

The above is an example, just make sure you change values accordingly to your env (hive.metastore.uris)

HTH

er_sharma_shant · ‎10-13-2018

@Felix Albani

If I don't copy hive-site.xml from hive conf directory for spark then I can't see Hive databases in spark(pyspark and spark-shell).

Could you please explain me what all properties I should add in hive-site.xml of saprk and where should I update hive.metastore.uris?

If I copy below property from hive conf to spark conf, will this work?

Technical Stack Details:

HDP3.0

Spark2.3

Hive3.1

<configuration>
  <property>
  <name>hive.metastore.uris</name>
    <!-- hostname must point to the Hive metastore URI in your cluster -->
    <value>thrift://hostname:9083</value>
    <description>URI for client to contact metastore server</description>
  </property>
</configuration>

er_sharma_shant · ‎10-15-2018

I copied /etc/hive/hive-site.xml from hive conf directory to /etc/spark2/ and then removed below properties from /etc/spark2/conf/hive-site.xml.

It's working now, I can see Hive databases in spark (pyspakr, spark-shell, spark-sql etc).

hive.tez.cartesian-product.enabled 
hive.metastore.warehouse.external.dir 
hive.server2.webui.use.ssl 
hive.heapsize 
hive.server2.webui.port 
hive.materializedview.rewriting.incremental 
hive.server2.webui.cors.allowed.headers 
hive.driver.parallel.compilation 
hive.tez.bucket.pruning 
hive.hook.proto.base-directory 
hive.load.data.owner 
hive.execution.mode 
hive.service.metrics.codahale.reporter.classes 
hive.strict.managed.tables 
hive.create.as.insert.only 
hive.optimize.dynamic.partition.hashjoin 
hive.server2.webui.enable.cors 
hive.metastore.db.type 
hive.txn.strict.locking.mode 
hive.metastore.transactional.event.listeners 
hive.tez.input.generate.consistent.splits

Do you see any consequences?

er_sharma_shant · ‎10-29-2018

HDP 3.0 has different way of integrating Apache Hive with Apache Spark using Hive Warehouse Connector.

Below article explains the steps:

https://community.hortonworks.com/content/kbentry/223626/integrating-apache-hive-with-apache-spark-h...

Cloudera Community

Support Questions

spark-sql : Error in session initiation - NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning

Working with Iceberg in CDE Spark Sessions

Spark SQL Error connecting Hive Metastore

Spark SQL - Update Command

Using CDE Resources in CDE Sessions

Spark HBase Connector + Composite Rowkey & SQL Ana...

Spark sql vs hive on spark

Spark SQL Internally

Spark SQL: Limit clause performance issues

Spark Sql error : Unable to acquire 1048576 bytes ...

Unable to initialize hive / run a job due to "non-...