<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Spark 2.1 Hive ORC saveAsTable pyspark in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-1-Hive-ORC-saveAsTable-pyspark/m-p/178025#M72938</link>
    <description>&lt;P&gt;I added same configs to spark2-client/conf/hive-site.xml and the following worked. &lt;/P&gt;&lt;PRE&gt;union_df.write.mode("append") \
        .insertInto("scratch.daily_test",overwrite = False)
&lt;/PRE&gt;</description>
    <pubDate>Sat, 30 Dec 2017 04:16:34 GMT</pubDate>
    <dc:creator>sanjeevreddy009</dc:creator>
    <dc:date>2017-12-30T04:16:34Z</dc:date>
    <item>
      <title>Spark 2.1 Hive ORC saveAsTable pyspark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-1-Hive-ORC-saveAsTable-pyspark/m-p/178024#M72937</link>
      <description>&lt;P&gt;Please Help.&lt;/P&gt;&lt;P&gt;I created a empty external table ORC format with 2 partitions in hive through cli &lt;/P&gt;&lt;P&gt;I then loginto  pyspark shell and run the following code &lt;/P&gt;&lt;PRE&gt;from pyspark.sql import SparkSession
spark = SparkSession.builder \
                    .enableHiveSupport() \
                    .config("hive.exec.dynamic.partition", "true") \
                    .config("hive.exec.dynamic.partition.mode", "nonstrict") \
                    .config("hive.exec.max.dynamic.partitions", "3000") \
                    .getOrCreate()

from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)

**** ETL 
#just to be save one more time 
sqlContext.setConf("hive.exec.dynamic.partition", "true")
sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
union_df.write.mode("append") \
        .insertInto("scratch.daily_test",overwrite = False)
&lt;/PRE&gt;&lt;P&gt;The Above code fails because the number of partitions in is &amp;gt; 1000 &lt;/P&gt;&lt;PRE&gt; pyspark.sql.utils.AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 2905, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 2905 &lt;/PRE&gt;&lt;P&gt;Which is why I set the max partitions to 3000 in the session builder.&lt;/P&gt;&lt;P&gt;I also checked hive-site.xml max partitions are 5000  ( this is in hive-client/conf/hive-site.xml)&lt;/P&gt;&lt;PRE&gt; &amp;lt;property&amp;gt;
      &amp;lt;name&amp;gt;hive.exec.dynamic.partition&amp;lt;/name&amp;gt;
      &amp;lt;value&amp;gt;true&amp;lt;/value&amp;gt;
    &amp;lt;/property&amp;gt;


    &amp;lt;property&amp;gt;
      &amp;lt;name&amp;gt;hive.exec.dynamic.partition.mode&amp;lt;/name&amp;gt;
      &amp;lt;value&amp;gt;nonstrict&amp;lt;/value&amp;gt;
    &amp;lt;/property&amp;gt;

    &amp;lt;property&amp;gt;
      &amp;lt;name&amp;gt;hive.exec.max.dynamic.partitions&amp;lt;/name&amp;gt;
      &amp;lt;value&amp;gt;5000&amp;lt;/value&amp;gt;
    &amp;lt;/property&amp;gt;
&lt;/PRE&gt;&lt;P&gt;When I ran &lt;/P&gt;&lt;PRE&gt; union_df.write.mode("append").format("orc").partitionBy("country","date_str").saveAsTable("scratch.daily_test")

I Get the following
pyspark.sql.utils.AnalysisException:
u'Saving data in the Hive serde table scratch.daily_test is not supported yet.
Please use the insertInto() API as an alternative.;'
&lt;/PRE&gt;&lt;P&gt;When I run &lt;/P&gt;&lt;PRE&gt;union_df.write.mode("append").format("orc").partitionBy("country","date_str").insertInto("scratch.daily_test")

I get pyspark.sql.utils.AnalysisException:
u"insertInto() can't be used together with partitionBy(). Partition
columns have already be defined for the table. It is not necessary to use
partitionBy().;"
&lt;/PRE&gt;&lt;P&gt;As of Now the following works but it overwrites the entire External structure to Parquet &lt;/P&gt;&lt;PRE&gt; union_df.write.mode("overwrite").partitionBy("country","date_str").saveAsTable("scratch.daily_test")
&lt;/PRE&gt;&lt;P&gt;Questions:&lt;/P&gt;&lt;P&gt;1. How do you insert into table with ORC format with partitions ?&lt;/P&gt;&lt;P&gt;2. How to work around the hive.exec.max.dynamic.partitions ?&lt;/P&gt;&lt;P&gt;Please let me know if you need any additional details&lt;/P&gt;</description>
      <pubDate>Sat, 30 Dec 2017 00:25:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-1-Hive-ORC-saveAsTable-pyspark/m-p/178024#M72937</guid>
      <dc:creator>sanjeevreddy009</dc:creator>
      <dc:date>2017-12-30T00:25:45Z</dc:date>
    </item>
    <item>
      <title>Re: Spark 2.1 Hive ORC saveAsTable pyspark</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-1-Hive-ORC-saveAsTable-pyspark/m-p/178025#M72938</link>
      <description>&lt;P&gt;I added same configs to spark2-client/conf/hive-site.xml and the following worked. &lt;/P&gt;&lt;PRE&gt;union_df.write.mode("append") \
        .insertInto("scratch.daily_test",overwrite = False)
&lt;/PRE&gt;</description>
      <pubDate>Sat, 30 Dec 2017 04:16:34 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Spark-2-1-Hive-ORC-saveAsTable-pyspark/m-p/178025#M72938</guid>
      <dc:creator>sanjeevreddy009</dc:creator>
      <dc:date>2017-12-30T04:16:34Z</dc:date>
    </item>
  </channel>
</rss>

