<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Saving Spark 2.2 dataframs in Hive table in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/59999#M66094</link>
    <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sorry I forgot to come back here and say how I found a quick workaround.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, here's how I do it:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;import org.apache.spark.sql.DataFrameWriter

val options = Map("path" -&amp;gt; "this is the path to your warehouse") // for me every database has a different warehouse. I am not using the default warehouse. I am using users' directory for warehousing DBs and tables
//and simply write it!
df.write.options(options).saveAsTable("db_name.table_name")&lt;/PRE&gt;&lt;P&gt;So as you can see a simple path to&amp;nbsp;the warehouse of the database will solve the problem.&amp;nbsp;I want to say Spark 2 is not aware of these metadata, but when you look at your spark.catalog you can see everything is there! So I don't know why it can't decide where is the path to your database when you want to write.save.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Sat, 16 Sep 2017 11:36:57 GMT</pubDate>
    <dc:creator>maziyar</dc:creator>
    <dc:date>2017-09-16T11:36:57Z</dc:date>
    <item>
      <title>Saving Spark 2.2 dataframs in Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/58463#M66092</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a problem with Spark 2.2 (latest CDH 5.12.0) and saving DataFrame into Hive table.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Things I can do:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;1. I can easily read tables from Hive tables in Spark 2.2&lt;/P&gt;&lt;P&gt;2. I can do saveAsTable in Spark 1.6 into Hive table and read it from&amp;nbsp;Spark 2.2&lt;/P&gt;&lt;P&gt;3. I can do write.saveAsTable in Spark 2.2 and see the files and data inside Hive table&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;Things I cannot do in Spark 2.2:&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;4. When I read Hive table saved by Spark 2.2 in spark2-shell, it shows empty rows. It has all the fields and schema but no data.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I don't understand what could cause this problem.&lt;/P&gt;&lt;P&gt;Any help would be appreciate it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;example:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;scala&amp;gt; val df = sc.parallelize(
     |   Seq(
     |     ("first", Array(2.0, 1.0, 2.1, 5.4)),
     |     ("test", Array(1.5, 0.5, 0.9, 3.7)),
     |     ("choose", Array(8.0, 2.9, 9.1, 2.5))
     |   ), 3
     | ).toDF
df: org.apache.spark.sql.DataFrame = [_1: string, _2: array&amp;lt;double&amp;gt;]

scala&amp;gt; df.show
+------+--------------------+
|    _1|                  _2|
+------+--------------------+
| first|[2.0, 1.0, 2.1, 5.4]|
|  test|[1.5, 0.5, 0.9, 3.7]|
|choose|[8.0, 2.9, 9.1, 2.5]|
+------+--------------------+

scala&amp;gt; df.write.saveAsTable("database.test")

scala&amp;gt; val savedDF = spark.sql("SELECT * FROM database.test")
res45: org.apache.spark.sql.DataFrame = [_1: string, _2: array&amp;lt;double&amp;gt;]

scala&amp;gt; savedDF.show
+---+---+
|_1|_2|
+---+---+
+---+---+
scala&amp;gt; savedDF.count
res55: Long = 0&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 12:03:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/58463#M66092</guid>
      <dc:creator>maziyar</dc:creator>
      <dc:date>2022-09-16T12:03:06Z</dc:date>
    </item>
    <item>
      <title>Re: Saving Spark 2.2 dataframs in Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/59967#M66093</link>
      <description>&lt;P&gt;I'm having the same problem under Cloudera CDH 5.12.1. I was previously using CDH 5.10.1 and upgraded in hope the error was resolved, but it persists in the latest version of CDH.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have filed a bug in the Apache Spark Bugtracker describing the problem and a workaround (manually specifying path and saving as external table or manually updating Hive metastore data) here:&amp;nbsp;&lt;A href="https://issues.apache.org/jira/browse/SPARK-21994" target="_blank"&gt;https://issues.apache.org/jira/browse/SPARK-21994&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The problem seems to be that Spark does not write the path to Hive metastore. Any advice how to fix this?&lt;/P&gt;</description>
      <pubDate>Fri, 15 Sep 2017 12:11:41 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/59967#M66093</guid>
      <dc:creator>pederpansen</dc:creator>
      <dc:date>2017-09-15T12:11:41Z</dc:date>
    </item>
    <item>
      <title>Re: Saving Spark 2.2 dataframs in Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/59999#M66094</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sorry I forgot to come back here and say how I found a quick workaround.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So, here's how I do it:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;import org.apache.spark.sql.DataFrameWriter

val options = Map("path" -&amp;gt; "this is the path to your warehouse") // for me every database has a different warehouse. I am not using the default warehouse. I am using users' directory for warehousing DBs and tables
//and simply write it!
df.write.options(options).saveAsTable("db_name.table_name")&lt;/PRE&gt;&lt;P&gt;So as you can see a simple path to&amp;nbsp;the warehouse of the database will solve the problem.&amp;nbsp;I want to say Spark 2 is not aware of these metadata, but when you look at your spark.catalog you can see everything is there! So I don't know why it can't decide where is the path to your database when you want to write.save.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Hope this helps &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 16 Sep 2017 11:36:57 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/59999#M66094</guid>
      <dc:creator>maziyar</dc:creator>
      <dc:date>2017-09-16T11:36:57Z</dc:date>
    </item>
    <item>
      <title>Re: Saving Spark 2.2 dataframs in Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/65144#M66095</link>
      <description>&lt;P&gt;This workaround has a &lt;STRONG&gt;severe&lt;/STRONG&gt; problem.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;val options = Map("path" -&amp;gt; "this is the path to your warehouse")&lt;/PRE&gt;&lt;P&gt;Do NOT do this. When you specify the "path" as just the warehouse location, Spark will assume that is the location that needs to be purged during an overwrite. This can wipe everything in your warehouse. So, if you put "user/hive/warehouse" it will delete everything in "user/hive/warehouse". This is bad and should not have been marked as the accepted answer.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I think the only reason maziyar didn't have everything wiped is because he is using separate warehouses for each db...or he is actually specifying the full path to each table.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Mar 2018 17:26:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/65144#M66095</guid>
      <dc:creator>aeastman</dc:creator>
      <dc:date>2018-03-06T17:26:13Z</dc:date>
    </item>
    <item>
      <title>Re: Saving Spark 2.2 dataframs in Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/65153#M66096</link>
      <description>&lt;P&gt;This is a good point to keep in mind when someone chooses this method since this is the only workaround for saving tables inside Hive in the older version of Cloudera.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;NOTE: This problem only happened in Cloudera and has been &lt;STRONG&gt;solved&lt;/STRONG&gt; in a newer version (Hive/Spark). As of CDH 5.14.0 this has been completely&amp;nbsp;resolved.&lt;/P&gt;</description>
      <pubDate>Tue, 06 Mar 2018 23:11:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/65153#M66096</guid>
      <dc:creator>maziyar</dc:creator>
      <dc:date>2018-03-06T23:11:13Z</dc:date>
    </item>
    <item>
      <title>Re: Saving Spark 2.2 dataframs in Hive table</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/69310#M66097</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm still get error in CDH 14 and spark2-shell&lt;/P&gt;</description>
      <pubDate>Fri, 22 Jun 2018 08:17:20 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Saving-Spark-2-2-dataframs-in-Hive-table/m-p/69310#M66097</guid>
      <dc:creator>kiakaku_vietnam</dc:creator>
      <dc:date>2018-06-22T08:17:20Z</dc:date>
    </item>
  </channel>
</rss>

