Created 12-07-2023 07:30 AM
Hi
We've done a recent upgrade from Hive 1.1.0-cdh5.16.2 to Hive 3.1.3 and have noticed the difference in behaviour when no rows have been returned and file generation.
On the older version (Hive 1.1.0-cdh5.16.2) it use to create empty part files (000000_0) whereas since the upgrade this isn't the case.
The command to run this is beeline and the sql statement starts like:
Insert overwrite directory 'some/path'
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
I cannot find any definitive documentation to saw this behaviour does not exist in later versions like Hive 3.1.3
Thanks
Created 12-07-2023 09:59 AM
@RS_11 Welcome to the Cloudera Community!
To help you get the best possible solution, I have tagged our Hive experts @Shmoo @cravani who may be able to assist you further.
Please keep us updated on your post, and we hope you find a satisfactory solution to your query.
Regards,
Diana Torres,Created 12-07-2023 11:34 AM
If I grasp your inquiry accurately, you are referring to the behavior in older CDH versions (specifically 5.x), where executing insert overwrite with a selection of 0 rows resulted in the creation of a 0-byte file named 000000_0. However, in the newer Hive version 3.1.3, you observe that no 0-byte files are generated in such scenarios.
Assuming my understanding is correct, may I ask why there is a desire to generate a 0-byte file? Having 0-byte or small files is deemed inefficient. It's noteworthy that there have been various changes in the past related to 0-byte files, as documented in the following issues:
Created 12-11-2023 02:00 PM
@RS_11 Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future. Thanks.
Regards,
Diana Torres,