<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark 2 beta load or save Hive managed table in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47410#M46261</link>
    <description>&lt;P&gt;Thank you, Brian.&lt;/P&gt;&lt;P&gt;I think the problem is about cannot connect to Hive metastore.&lt;/P&gt;&lt;P&gt;The &lt;STRONG&gt;default&lt;/STRONG&gt; database is not empty.&lt;/P&gt;&lt;P&gt;If I use spark-shell in Spark 1.6.0 (/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark):&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scala&amp;gt; sqlContext.sql("show tables").show()&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scala&amp;gt; sys.env("HADOOP_CONF_DIR")&lt;BR /&gt;res1: String = /usr/lib/spark/conf/yarn-conf:/etc/hive/conf:/etc/hive/conf&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;It works well and prints all the table managed by Hive.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, in Spark 2.0.0 (SPARK2-2.0.0.cloudera.beta1-1.cdh5.7.0.p0.108015):&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scala&amp;gt; sys.env("HADOOP_CONF_DIR")&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;res0: String = /opt/cloudera/parcels/SPARK2-2.0.0.cloudera.beta2-1.cdh5.7.0.p0.110234/lib/spark2/conf/yarn-conf&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There's no hive related conf dir in $HADOOP_CONF_DIR&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;By the way, in Spark 2.0.0&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scala&amp;gt; val df = spark.read.parquet("/user/hive/warehouse/test_db.db/test_table_pqt")&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scala&amp;gt; df.show(5)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;This works with pre-managed Hive table in Spark 1.6.0.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Wed, 16 Nov 2016 01:13:17 GMT</pubDate>
    <dc:creator>zhuangmz</dc:creator>
    <dc:date>2016-11-16T01:13:17Z</dc:date>
    <item>
      <title>Spark 2 beta load or save Hive managed table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47397#M46259</link>
      <description>&lt;P&gt;Hi, when I try to list tables in Hive, it shows nothing.&lt;BR /&gt;import org.apache.spark.sql.SparkSession&lt;BR /&gt;val spark = SparkSession.builder().appName("Spark2 Hive Example").config("spark.sql.warehouse.dir", "hdfs://quickstart.cloudera/user/hive/warehouse").enableHiveSupport().getOrCreate()&lt;BR /&gt;spark.catalog.listTables("default").show()&lt;/P&gt;&lt;P&gt;I'm using&amp;nbsp;CDH-5.9.0-1.cdh5.9.0.p0.23 and&amp;nbsp;SPARK2-2.0.0.cloudera.beta2-1.cdh5.7.0.p0.110234.&lt;BR /&gt;Could&amp;nbsp;anyone&amp;nbsp;show me how to load and save dataframe from / into Hive?&lt;/P&gt;&lt;P&gt;Thanks.&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:47:37 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47397#M46259</guid>
      <dc:creator>zhuangmz</dc:creator>
      <dc:date>2022-09-16T10:47:37Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 beta load or save Hive managed table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47406#M46260</link>
      <description>&lt;P&gt;If you are only concerned with static DataFrames (and not steaming), this is pretty straight forward programmatically.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To create the DataFrame from a Hive table with example query:&lt;/P&gt;&lt;P&gt;df = spark.sql(SELECT * FROM &lt;EM&gt;table_name1&lt;/EM&gt;)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;To save a DataFrame back to a Hive table:&lt;/P&gt;&lt;P&gt;df.write.saveAsTable('table_name2',format='parquet',mode='overwrite')&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Now, you may want to try listing databases instead of tables. Listing tables will only list the tables associated with your &lt;EM&gt;current&lt;/EM&gt; database. The default database is likely empty if you're just starting out.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My struggle is in Spark Streaming with version 2.0.0.cloudera.beta1, where the saveAsTable method is not available for a streaming DataFrame. That makes it all a bit trickier compared to the static dataframe read/write.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 15 Nov 2016 22:10:01 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47406#M46260</guid>
      <dc:creator>BrianWhite</dc:creator>
      <dc:date>2016-11-15T22:10:01Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 beta load or save Hive managed table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47410#M46261</link>
      <description>&lt;P&gt;Thank you, Brian.&lt;/P&gt;&lt;P&gt;I think the problem is about cannot connect to Hive metastore.&lt;/P&gt;&lt;P&gt;The &lt;STRONG&gt;default&lt;/STRONG&gt; database is not empty.&lt;/P&gt;&lt;P&gt;If I use spark-shell in Spark 1.6.0 (/opt/cloudera/parcels/CDH-5.9.0-1.cdh5.9.0.p0.23/lib/spark):&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scala&amp;gt; sqlContext.sql("show tables").show()&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scala&amp;gt; sys.env("HADOOP_CONF_DIR")&lt;BR /&gt;res1: String = /usr/lib/spark/conf/yarn-conf:/etc/hive/conf:/etc/hive/conf&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;It works well and prints all the table managed by Hive.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, in Spark 2.0.0 (SPARK2-2.0.0.cloudera.beta1-1.cdh5.7.0.p0.108015):&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scala&amp;gt; sys.env("HADOOP_CONF_DIR")&lt;/STRONG&gt;&lt;BR /&gt;&lt;STRONG&gt;res0: String = /opt/cloudera/parcels/SPARK2-2.0.0.cloudera.beta2-1.cdh5.7.0.p0.110234/lib/spark2/conf/yarn-conf&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;There's no hive related conf dir in $HADOOP_CONF_DIR&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;By the way, in Spark 2.0.0&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scala&amp;gt; val df = spark.read.parquet("/user/hive/warehouse/test_db.db/test_table_pqt")&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;scala&amp;gt; df.show(5)&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;This works with pre-managed Hive table in Spark 1.6.0.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Nov 2016 01:13:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47410#M46261</guid>
      <dc:creator>zhuangmz</dc:creator>
      <dc:date>2016-11-16T01:13:17Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 beta load or save Hive managed table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47411#M46262</link>
      <description>another concern is about $SPARK_LIBRARY_PATH.&lt;BR /&gt;might be something wrong with Hive jar dependencies.</description>
      <pubDate>Wed, 16 Nov 2016 01:28:22 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47411#M46262</guid>
      <dc:creator>zhuangmz</dc:creator>
      <dc:date>2016-11-16T01:28:22Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 beta load or save Hive managed table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47415#M46263</link>
      <description>&lt;P&gt;I'm so stupid...&lt;/P&gt;&lt;P&gt;There's a &lt;STRONG&gt;Hive Service&lt;/STRONG&gt; configuration item in Spark 2.0.0 beta2...&lt;/P&gt;&lt;P&gt;Just check this is enable to the correct Hive Service in CDH.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 16 Nov 2016 02:55:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/47415#M46263</guid>
      <dc:creator>zhuangmz</dc:creator>
      <dc:date>2016-11-16T02:55:43Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2 beta load or save Hive managed table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/54663#M46264</link>
      <description>&lt;P&gt;Ran into the same problem, resolved by enabling 'Hive Service' in Spark2.&lt;/P&gt;</description>
      <pubDate>Thu, 11 May 2017 23:14:28 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-beta-load-or-save-Hive-managed-table/m-p/54663#M46264</guid>
      <dc:creator>prakharpanwaria</dc:creator>
      <dc:date>2017-05-11T23:14:28Z</dc:date>
    </item>
  </channel>
</rss>

