About Scass

Scass · ‎09-06-2023

Sorry meant to say 5 files above or at least a multiple of 5... instead of 2.

Scass · ‎09-06-2023

I seem to be having trouble creating a bucketed table and having hive (using spark sql in pyspark) recognize that table as bucketed during a join. I have created a simplified table of 10 records and have the bucketed key of type integer. I insert the 10 records with values 1-10 for the key column and expect to see 10 files but see 2 files. table with values I will insert into above bucketed table: when I look at the number of files created I see only 2. Was expecting to see 5 files if a mod is done on an integer value. In my real problem I am having a string key and running into memory issues as it appears hive does not believe the files are bucketed and sorted and is spilling quite a bit of data during the join of the 2 bucketed tables and the number of underlying files is much greater than the number of buckets.

Online	Offline
Last Visited	‎09-21-2023 11:35 AM

Member Since	‎09-06-2023 02:13 PM
Last Visited	‎09-21-2023 11:35 AM
Posts	2

Cloudera Community

Re: Hive bucketed table producing less files than ...

Hive bucketed table producing less files than expe...