Member since
01-21-2016
2
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5781 | 12-29-2017 08:16 PM |
12-29-2017
08:16 PM
I added same configs to spark2-client/conf/hive-site.xml and the following worked. union_df.write.mode("append") \
.insertInto("scratch.daily_test",overwrite = False)
... View more
12-29-2017
04:25 PM
Please Help. I created a empty external table ORC format with 2 partitions in hive through cli I then loginto pyspark shell and run the following code from pyspark.sql import SparkSession
spark = SparkSession.builder \
.enableHiveSupport() \
.config("hive.exec.dynamic.partition", "true") \
.config("hive.exec.dynamic.partition.mode", "nonstrict") \
.config("hive.exec.max.dynamic.partitions", "3000") \
.getOrCreate()
from pyspark.sql import HiveContext
sqlContext = HiveContext(sc)
**** ETL
#just to be save one more time
sqlContext.setConf("hive.exec.dynamic.partition", "true")
sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
union_df.write.mode("append") \
.insertInto("scratch.daily_test",overwrite = False)
The Above code fails because the number of partitions in is > 1000 pyspark.sql.utils.AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 2905, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 2905 Which is why I set the max partitions to 3000 in the session builder. I also checked hive-site.xml max partitions are 5000 ( this is in hive-client/conf/hive-site.xml) <property>
<name>hive.exec.dynamic.partition</name>
<value>true</value>
</property>
<property>
<name>hive.exec.dynamic.partition.mode</name>
<value>nonstrict</value>
</property>
<property>
<name>hive.exec.max.dynamic.partitions</name>
<value>5000</value>
</property>
When I ran union_df.write.mode("append").format("orc").partitionBy("country","date_str").saveAsTable("scratch.daily_test")
I Get the following
pyspark.sql.utils.AnalysisException:
u'Saving data in the Hive serde table scratch.daily_test is not supported yet.
Please use the insertInto() API as an alternative.;'
When I run union_df.write.mode("append").format("orc").partitionBy("country","date_str").insertInto("scratch.daily_test")
I get pyspark.sql.utils.AnalysisException:
u"insertInto() can't be used together with partitionBy(). Partition
columns have already be defined for the table. It is not necessary to use
partitionBy().;"
As of Now the following works but it overwrites the entire External structure to Parquet union_df.write.mode("overwrite").partitionBy("country","date_str").saveAsTable("scratch.daily_test")
Questions: 1. How do you insert into table with ORC format with partitions ? 2. How to work around the hive.exec.max.dynamic.partitions ? Please let me know if you need any additional details
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark