Member since
05-09-2022
1
Post
1
Kudos Received
0
Solutions
07-26-2022
04:17 AM
1 Kudo
Hi Team, CDP uses the "org.apache.spark.internal.io.cloud.PathOutputCommitProtocol" OutputCommitter which does not support dynamicPartitionOverwrite. You can set the following parameters into your spark job. code level: spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
spark.conf.set("spark.sql.parquet.output.committer.class", "org.apache.parquet.hadoop.ParquetOutputCommitter")
spark.conf.set("spark.sql.sources.commitProtocolClass", "org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol") spark-submit/spark-shell: --conf spark.sql.sources.partitionOverwriteMode=dynamic --conf spark.sql.parquet.output.committer.class=org.apache.parquet.hadoop.ParquetOutputCommitter --conf spark.sql.sources.commitProtocolClass=org.apache.spark.sql.execution.datasources.SQLHadoopMapReduceCommitProtocol Note: If you are using S3, you can disable it by specifying spark.cloudera.s3_committers.enabled parameter. --conf spark.cloduera.s3_committers.enabled=false
... View more