Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive bucketed table from Spark 2.3

Hive bucketed table from Spark 2.3

Hi, we have been using a HDI instance with spark 2.2. In this instance we are loading data from spark into a bucketed hive table. We recently looked at moving to HDP 2.6 on cloudbreak but cant get the same code working due to the error "is bucketed but Spark currently does NOT populate bucketed output which is compatible with Hive". Is there a way to enable this functionality? and if not is there a reason it works on HDI spark 2.2?


Re: Hive bucketed table from Spark 2.3


Creating Hive bucketed table is supported from Spark 2.3 (Jira SPARK-17729). Spark will disallow users from writing outputs to hive bucketed tables, by default.

Setting `hive.enforce.bucketing=false` and `hive.enforce.sorting=false` will allow you to save to hive bucketed tables.

If you want, you can set those two properties in Custom spark2-hive-site-override on Ambari, then all spark2 application will pick the configurations.

For more details,refer Slideshare.

Don't have an account?
Coming from Hortonworks? Activate your account here