About sanjeevreddy009

sanjeevreddy009 · ‎12-29-2017

I added same configs to spark2-client/conf/hive-site.xml and the following worked. union_df.write.mode("append") \ .insertInto("scratch.daily_test",overwrite = False)

sanjeevreddy009 · ‎12-29-2017

Please Help. I created a empty external table ORC format with 2 partitions in hive through cli I then loginto pyspark shell and run the following code from pyspark.sql import SparkSession spark = SparkSession.builder \ .enableHiveSupport() \ .config("hive.exec.dynamic.partition", "true") \ .config("hive.exec.dynamic.partition.mode", "nonstrict") \ .config("hive.exec.max.dynamic.partitions", "3000") \ .getOrCreate() from pyspark.sql import HiveContext sqlContext = HiveContext(sc) **** ETL #just to be save one more time sqlContext.setConf("hive.exec.dynamic.partition", "true") sqlContext.setConf("hive.exec.dynamic.partition.mode", "nonstrict") union_df.write.mode("append") \ .insertInto("scratch.daily_test",overwrite = False) The Above code fails because the number of partitions in is > 1000 pyspark.sql.utils.AnalysisException: u'org.apache.hadoop.hive.ql.metadata.HiveException: Number of dynamic partitions created is 2905, which is more than 1000. To solve this try to set hive.exec.max.dynamic.partitions to at least 2905 Which is why I set the max partitions to 3000 in the session builder. I also checked hive-site.xml max partitions are 5000 ( this is in hive-client/conf/hive-site.xml) <property> <name>hive.exec.dynamic.partition</name> <value>true</value> </property> <property> <name>hive.exec.dynamic.partition.mode</name> <value>nonstrict</value> </property> <property> <name>hive.exec.max.dynamic.partitions</name> <value>5000</value> </property> When I ran union_df.write.mode("append").format("orc").partitionBy("country","date_str").saveAsTable("scratch.daily_test") I Get the following pyspark.sql.utils.AnalysisException: u'Saving data in the Hive serde table scratch.daily_test is not supported yet. Please use the insertInto() API as an alternative.;' When I run union_df.write.mode("append").format("orc").partitionBy("country","date_str").insertInto("scratch.daily_test") I get pyspark.sql.utils.AnalysisException: u"insertInto() can't be used together with partitionBy(). Partition columns have already be defined for the table. It is not necessary to use partitionBy().;" As of Now the following works but it overwrites the entire External structure to Parquet union_df.write.mode("overwrite").partitionBy("country","date_str").saveAsTable("scratch.daily_test") Questions: 1. How do you insert into table with ORC format with partitions ? 2. How to work around the hive.exec.max.dynamic.partitions ? Please let me know if you need any additional details

Online	Offline
Last Visited	‎12-29-2017 08:16 PM

Member Since	‎01-21-2016 06:14 PM
Last Visited	‎12-29-2017 08:16 PM
Posts	2

Cloudera Community

Re: Spark 2.1 Hive ORC saveAsTable pyspark

Re: Spark 2.1 Hive ORC saveAsTable pyspark

Spark 2.1 Hive ORC saveAsTable pyspark