Created 09-29-2021 07:38 PM
I'm trying to read tables and columns and columns' type from Metastore DB. I checked "COLUMN_V2" table and it doesn't have TBL_ID, but it has CD_ID ! Then, I released that there is a table named CDS having one column which is "CD_ID" and the values are as the same as TBL_ID in TBLS table. I want to know if below query using CD_ID is a reliable query or not.
Select tbls.TBL_NAME, cv2.COLUMN_NAME,cv2.TYPE_NAME from TBLS tbls
JOIN COLUMNS_V2 cv2 on tbls.TBL_ID = cv2.CD_ID
And I would be grateful if I can know more about below concerns:
1. What is the purpose of CDS table?
2. why TBL_ID was not enough and CD_ID was needed?
3. What does "CD" stands for?
4. Does CD_ID has the same value as TBL_ID all the time? or there are scenarios that they can be different?
Created 09-29-2021 11:52 PM
CDS table is for Column Descriptors. CD stands for column descriptor. It is used to decouple columns from tables/partitions to reduce storage needs. tables/partitions are almost 1-1 map to these descriptors… but a table may have more than one Column Descriptor in the case of schema evolution!
Created 09-29-2021 11:52 PM
CDS table is for Column Descriptors. CD stands for column descriptor. It is used to decouple columns from tables/partitions to reduce storage needs. tables/partitions are almost 1-1 map to these descriptors… but a table may have more than one Column Descriptor in the case of schema evolution!
Created 09-30-2021 12:14 AM
Thank you @asish , your explanation was really helpful.
Is there any books/webpage/article that I can read to know more about Metastore and Hive metadata?
Created 09-30-2021 01:07 AM
@Mas_Jamie apologies. Unfortunately we dont have the document 😞