<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Unable to find Hive database from Pyspark scripts in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200413#M83659</link>
    <description>&lt;P&gt;Hi Sandeep,&lt;/P&gt;&lt;P&gt;I appreciate your response. I followed both of your suggestions:&lt;/P&gt;&lt;PRE&gt;# cp /etc/hive/conf/hive-site.xml /etc/spark/conf/&lt;/PRE&gt;&lt;P&gt;This did not result in a change in behavior&lt;/P&gt;&lt;PRE&gt;# spark-submit --py-files ./spark.zip --files /etc/spark/conf/hive-site.xml sample.py&lt;/PRE&gt;&lt;P&gt;No change there, either.&lt;/P&gt;&lt;P&gt;I also tried to create the &lt;EM&gt;sampletable &lt;/EM&gt;in the &lt;EM&gt;default &lt;/EM&gt;database, instead. Same result, different error message:&lt;/P&gt;&lt;PRE&gt;'UnresolvedRelation `default`.`sampletable`&lt;/PRE&gt;&lt;P&gt;I suspect that the job being submitted is looking at a different Hive metastore than the one I am viewing in Ambari &amp;gt; Hive 2.0 View....even though I am seeing the output that would indicate, IMO, that the Spark job is using the correct Hive metastore:&lt;/P&gt;&lt;PRE&gt;SharedState:54 - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('/apps/hive/warehouse').                                                 
2018-09-19 15:29:59 INFO  SharedState:54 - Warehouse path is '/apps/hive/warehouse'
&lt;/PRE&gt;&lt;P&gt;I am using the default master for, which is 'local[*]'.&lt;/P&gt;&lt;P&gt;Any other ideas?&lt;/P&gt;</description>
    <pubDate>Wed, 19 Sep 2018 22:45:42 GMT</pubDate>
    <dc:creator>Falter_Christop</dc:creator>
    <dc:date>2018-09-19T22:45:42Z</dc:date>
    <item>
      <title>Unable to find Hive database from Pyspark scripts</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200411#M83657</link>
      <description>&lt;P&gt;I am using the HDP 2.6.5 Sandbox on Docker/Windows 10.&lt;/P&gt;&lt;P&gt;Using the Hive 2 view, I created a database and table. Let's call them "sampledb" with a table called "sampletable" in HDFS location /sampledb/sampletable/. I stored data in the form of an ORC file in the appropriate directory, and invoked `msck repair sampledb.sampletable`. From Ambari Hive View and Hive View 2.0 I am able to successfully read data from sampletable.&lt;/P&gt;&lt;P&gt;In Zeppelin I wrote a Pyspark script that uses the Spark SQL interface to read data from sampletable. It works, no problem. Note, this is Spark 2, not Spark 1.6.&lt;/P&gt;&lt;P&gt;We do not want to run Zeppelin scripts in production, so I converted the script to standard Python for use with spark-submit. The Spark initialization code is below:&lt;/P&gt;&lt;PRE&gt;settings = [
    ("hive.exec.dynamic.partition", "true"),
    ("hive.exec.dynamic.partition.mode", "nonstrict"),
    ("spark.sql.orc.filterPushdown", "true"),
    ("hive.msck.path.validation", "ignore"),
    ("spark.sql.caseSensitive", "true"),
    ("spark.speculation", "false"),
    ] 
spark_conf = SparkConf().setAppName("sampleApp").setAll(settings)
self.spark = SparkSession.builder. \
    enableHiveSupport(). \
    config(conf = spark_conf). \
    getOrCreate()
&lt;/PRE&gt;&lt;P&gt;Python dependencies are zipped into spark.zip and the script is called sample.py. After copying the script to my /home/ dir, I attempt run it as follows:&lt;/P&gt;&lt;PRE&gt;# spark-submit --py-files ./spark.zip sample.py&lt;/PRE&gt;&lt;P&gt;The relevant error output, so far as I can tell, are the following console outputs:&lt;/P&gt;&lt;PRE&gt;SharedState:54 - Warehouse path is 'file:/home/raj_ops/spark-warehouse'.
...
HiveClientImpl:54 - Warehouse location for Hive client (version 1.2.2) is file:/home/raj_ops/spark-warehouse
...
Failed to get database sampledb, returning NoSuchObjectException                                              
u"Table or view not found: `sampledb`.`sampletable`
&lt;/PRE&gt;&lt;P&gt;The only thing I had to show for my effort was a new /spark-warehouse sub-directory within /home/raj_ops/. So I tried adding the following line to the Python script:&lt;/P&gt;&lt;PRE&gt;("spark.sql.warehouse.dir","/apps/hive/warehouse")&lt;/PRE&gt;&lt;P&gt;So now the full configuration is as follows:&lt;/P&gt;&lt;PRE&gt;settings = [
    ("hive.exec.dynamic.partition", "true"),
    ("hive.exec.dynamic.partition.mode", "nonstrict"),
    ("spark.sql.orc.filterPushdown", "true"),
    ("hive.msck.path.validation", "ignore"),
    ("spark.sql.caseSensitive", "true"),
    ("spark.speculation", "false"),
    ("spark.sql.warehouse.dir","/apps/hive/warehouse")
    ] 
spark_conf = SparkConf().setAppName("sampleApp").setAll(settings)
&lt;/PRE&gt;&lt;P&gt;The invocation of spark-submit was the same. The outcome was similarly unsuccessful.&lt;/P&gt;&lt;P&gt;What else do I need to do to get the queries from Pyspark scripts to access the Hive database and tables successfully? &lt;/P&gt;&lt;P&gt;Thanks in advance!&lt;/P&gt;&lt;P&gt;Chris Falter&lt;/P&gt;</description>
      <pubDate>Wed, 19 Sep 2018 17:36:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200411#M83657</guid>
      <dc:creator>Falter_Christop</dc:creator>
      <dc:date>2018-09-19T17:36:05Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to find Hive database from Pyspark scripts</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200412#M83658</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/17970/falter-christopher.html" nodeid="17970"&gt;@Christopher Falter&lt;/A&gt;&lt;P&gt;What is the master you are running the job with? &lt;/P&gt;&lt;P&gt;Make sure hive-site.xml is present in /etc/spark/conf/ and if it is a yarn-cluster mode please pass the hive-site.xml using --files parameter in spark-submit command. &lt;/P&gt;</description>
      <pubDate>Wed, 19 Sep 2018 17:48:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200412#M83658</guid>
      <dc:creator>sandyy006</dc:creator>
      <dc:date>2018-09-19T17:48:27Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to find Hive database from Pyspark scripts</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200413#M83659</link>
      <description>&lt;P&gt;Hi Sandeep,&lt;/P&gt;&lt;P&gt;I appreciate your response. I followed both of your suggestions:&lt;/P&gt;&lt;PRE&gt;# cp /etc/hive/conf/hive-site.xml /etc/spark/conf/&lt;/PRE&gt;&lt;P&gt;This did not result in a change in behavior&lt;/P&gt;&lt;PRE&gt;# spark-submit --py-files ./spark.zip --files /etc/spark/conf/hive-site.xml sample.py&lt;/PRE&gt;&lt;P&gt;No change there, either.&lt;/P&gt;&lt;P&gt;I also tried to create the &lt;EM&gt;sampletable &lt;/EM&gt;in the &lt;EM&gt;default &lt;/EM&gt;database, instead. Same result, different error message:&lt;/P&gt;&lt;PRE&gt;'UnresolvedRelation `default`.`sampletable`&lt;/PRE&gt;&lt;P&gt;I suspect that the job being submitted is looking at a different Hive metastore than the one I am viewing in Ambari &amp;gt; Hive 2.0 View....even though I am seeing the output that would indicate, IMO, that the Spark job is using the correct Hive metastore:&lt;/P&gt;&lt;PRE&gt;SharedState:54 - Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('/apps/hive/warehouse').                                                 
2018-09-19 15:29:59 INFO  SharedState:54 - Warehouse path is '/apps/hive/warehouse'
&lt;/PRE&gt;&lt;P&gt;I am using the default master for, which is 'local[*]'.&lt;/P&gt;&lt;P&gt;Any other ideas?&lt;/P&gt;</description>
      <pubDate>Wed, 19 Sep 2018 22:45:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200413#M83659</guid>
      <dc:creator>Falter_Christop</dc:creator>
      <dc:date>2018-09-19T22:45:42Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to find Hive database from Pyspark scripts</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200414#M83660</link>
      <description>&lt;P&gt;I was able to resolve the problem by manually editing my Python script to incorporate the settings from /etc/hive/conf/hive-site.xml. What worked is the following code:&lt;/P&gt;&lt;PRE&gt;        settings = [
                ("hive.exec.dynamic.partition", "true"),
                ("hive.exec.dynamic.partition.mode", "nonstrict"),
                ("spark.sql.orc.filterPushdown", "true"),
                ("hive.msck.path.validation", "ignore"),
                ("spark.sql.caseSensitive", "true"),
                ("spark.speculation", "false"),
                ("hive.metastore.authorization.storage.checks", "false"),
                ("hive.metastore.cache.pinobjtypes", "Table,Database,Type,FieldSchema,Order"),
                ("hive.metastore.client.connect.retry.delay", "5s"),
                ("hive.metastore.client.socket.timeout", "1800s"),
                ("hive.metastore.connect.retries", "12"),
                ("hive.metastore.execute.setugi", "false"),
                ("hive.metastore.failure.retries", "12"),
                ("hive.metastore.kerberos.keytab.file", "/etc/security/keytabs/hive.service.keytab"),
                ("hive.metastore.kerberos.principal", "hive/_HOST@EXAMPLE.COM"),
                ("hive.metastore.pre.event.listeners", "org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener"),
                ("hive.metastore.sasl.enabled", "false"),
                ("hive.metastore.schema.verification", "false"),
                ("hive.metastore.schema.verification.record.version", "false"),
                ("hive.metastore.server.max.threads", "100000"),
                ("hive.metastore.uris", "thrift://sandbox-hdp.hortonworks.com:9083"),
                ("hive.metastore.warehouse.dir", "thrift://sandbox-hdp.hortonworks.com:9083"),
                ("hive.metastore.authorization.storage.checks", "/apps/hive/warehouse")
                ] 
        spark_conf = SparkConf().setAppName("sampleApp").setAll(settings)
        self.spark = SparkSession.builder. \
            config(conf = spark_conf). \
            enableHiveSupport(). \
            getOrCreate()&lt;/PRE&gt;&lt;P&gt;What did *not* work was copying /etc/hive/conf/hive-site.xml to /etc/spark2/conf/hive-site.xml. I don't know why this didn't work, but it didn't.&lt;/P&gt;&lt;P&gt;I hope this resolution helps someone else!&lt;/P&gt;&lt;P&gt;Chris Falter&lt;/P&gt;</description>
      <pubDate>Mon, 24 Sep 2018 22:00:12 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200414#M83660</guid>
      <dc:creator>Falter_Christop</dc:creator>
      <dc:date>2018-09-24T22:00:12Z</dc:date>
    </item>
    <item>
      <title>Re: Unable to find Hive database from Pyspark scripts</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200415#M83661</link>
      <description>&lt;A rel="user" href="https://community.cloudera.com/users/17970/falter-christopher.html" nodeid="17970"&gt;@Christopher Falter&lt;/A&gt;&lt;P&gt;Sorry i didn't get a chance to test this. Usually with /etc/spark2/conf/hive-site.xml spark should be able to connect to hive. &lt;/P&gt;</description>
      <pubDate>Mon, 24 Sep 2018 23:34:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Unable-to-find-Hive-database-from-Pyspark-scripts/m-p/200415#M83661</guid>
      <dc:creator>sandyy006</dc:creator>
      <dc:date>2018-09-24T23:34:31Z</dc:date>
    </item>
  </channel>
</rss>

