I did as @ssubhas said, setting the attributes to false.
spark.sql("SET spark.hadoop.hive.exec.dynamic.partition = true")
spark.sql("SET spark.hadoop.hive.exec.dynamic.partition.mode = nonstrict")
- Spark can create the bucketed table in Hive with no issues.
- Spark inserted the data into the table, but it totally ignored the fact that the table is bucketed. So when I open a partition, I see only 1 file.
When inserting, we should set hive.enforce.bucketing = true, not false. And you will face the following error in Spark logs.
org.apache.spark.sql.AnalysisException: Output Hive table `hive_test_db`.`test_bucketing` is bucketed but Spark currently does NOT populate bucketed output which is compatible with Hive.;
This means that Spark doesn't support insertion into bucketed Hive tables.
The first answer in this Stackoverflow question, explains that what @ssubhas suggested is a workaround that doesn't guarantee bucketing.