New Contributor
Posts: 1
Registered: ‎09-08-2017

Spark 2.2 bucketBy saveAsTable invalid Hive schema

[ Edited ]



We would like to implement bucketing on some of our tables but we struggle getting them readable in Hive.


First we had issue with saveAsTable as stated here. That being solved, we did our bucketing like ; 


spark.table("large_table_1").write.options(Map("path" -> "/path/warehouse/bucketed_large_table_1")).bucketBy(100, "num1").sortBy("num1").saveAsTable("bucketed_large_table_1")


The bucketing works fine and we avoided some costly shuffle steps in our ETL.


Though, Hive is not happy with our table schema when bucketed :


Screen Shot 2017-10-10 at 16.32.50.pngScreen Shot 2017-10-10 at 16.32.23.png



Any suggestion ?


I believe buckting support with hive is being greatly improved at the moment but yet it's sad that we cannot use it properly.