Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark 2.2 bucketBy saveAsTable invalid Hive schema

Spark 2.2 bucketBy saveAsTable invalid Hive schema

New Contributor



We would like to implement bucketing on some of our tables but we struggle getting them readable in Hive.


First we had issue with saveAsTable as stated here. That being solved, we did our bucketing like ; 


spark.table("large_table_1").write.options(Map("path" -> "/path/warehouse/bucketed_large_table_1")).bucketBy(100, "num1").sortBy("num1").saveAsTable("bucketed_large_table_1")


The bucketing works fine and we avoided some costly shuffle steps in our ETL.


Though, Hive is not happy with our table schema when bucketed :


Screen Shot 2017-10-10 at 16.32.50.pngScreen Shot 2017-10-10 at 16.32.23.png



Any suggestion ?


I believe buckting support with hive is being greatly improved at the moment but yet it's sad that we cannot use it properly.