Created 10-11-2024 06:32 AM
I renamed a Hive partitioned table DB1.Tab1 to DB1.Tab1_old and then moved another table with the same name but from another database DB2.Tab1 to the first database DB1.Tab1, using the same rename command. It looks like the partitioned directories were merged because when I deleted the first renamed one (DB1.Tab1_old), the moved table partitions (DB1.Tab1) were deleted. I would like to understand why this happened. Regards
Created 10-16-2024 12:56 PM
Hello @Patriciabqc
It seems this was fixed in CDP-7.1.8 according to the TSB, please see the Knowledge base link below.
Created 10-11-2024 08:32 AM
Are you able to reproduce at will?
If so, provide the Cloudera Runtime version and the associated DDL/DML
Created 10-11-2024 08:42 AM
Hi @Patriciabqc
The behavior depends on what was the table type and it's table properties. The CDP version may be also important as there were some fixes around the rename functionality (whether HDFS location is changed) between the versions.
The behavior is different for managed and external tables. If you rename a managed table, it should rename the HDFS location too, so you should not end up in such situations. Because of that I assume these were external tables.
The table property 'TRANSLATED_TO_EXTERNAL'='TRUE' should make an external table behave like a managed table ("legacy CDH managed table" behavior), so when you rename the table the location is also changed, so probably we can rule out this too, but I mention this for clarity that this is also an option - and maybe a way to avoid this in the future.
Last but not least, for external tables if the table property 'external.table.purge'='true' is there, then it will delete the HDFS location when the table is dropped. The symptoms suggest this was the case.
I hope this explains.
Best regards, Miklos
Created 10-11-2024 11:05 AM
We use CDP 7.1.7. The table is 'external.table.purge'='true'. After the renames, when I dropped the table that was renamed to '_old' it affected the partitions of the one that was moved to the database and renamed to the same name as the other (before being renamed). As if they were both accessing the same directory.
Created 10-14-2024 12:51 AM
Yes, with renaming external tables Hive does not change their locations (to the new desired path, irrespective whether the DB changes or the table name), so once you drop the renamed table, due to the 'external.table.purge'='true' it dropped the original location's data directory.
FYI there were some fixes around the rename functionality, HIVE-24920 fixed in 7.1.7 SP1 and HIVE-26158 fixed in 7.1.7 SP2, not sure if that affects you but it's recommended to apply the latest service pack.
Created 10-16-2024 06:13 AM
Right, thanks for verifying that the partition locations are not renamed. Then it is clearly the HIVE-26158, which has been fixed in 7.1.7 SP2, please plan/test and apply the 7.1.7 SP2 or the even newer 7.1.7 SP3 version (recommended) to avoid this in the future. (one should not need to alter the partition locations when table is renamed)
Best regards
Miklos
Created on 10-16-2024 11:06 AM - edited 10-16-2024 11:07 AM
Created 10-16-2024 11:10 AM
Yes, it seems to be this issue, but we already have 7.1.7 SP2 installed. Maybe SP3 fixed this issue... Thanks!
Created on 10-11-2024 11:02 AM - edited 10-11-2024 11:41 AM
Created 10-15-2024 06:49 AM
When I renamed the table, Hive renamed its HDFS directory, but it seemed that in the metastore database the partitions directories were not renamed. Maybe I needed to run some command to change the partitions locations in the metastore after renaming the table...