Created 07-24-2018 09:33 PM
I am trying to figure out how to query table and column comments (descriptions) in HIVE Metastore. If there is a way to add the comments to HIVE tables/columns, shouldn't be a way to query/report on the comments from HIVE Metastore?
Please help 🙂
Best Regards, Michael
Created 07-25-2018 04:43 PM
Your response is a bit overcomplicated. I resolved my own question. Below is a type of a query from HIVE Metastore I was looking for:
SELECT DBS.NAME AS OWNER,
TBLS.TBL_NAME as OBJECT_NAME,
TBL_COMMENTS.TBL_COMMENT as OBJECT_DESCRIPTION,
TBLS.TBL_ID as OBJECT_ID,
TBLS.TBL_TYPE as OBJECT_TYPE,
'VALID' as OBJECT_STATUS,
COLUMNS_V2.COLUMN_NAME,
COLUMNS_V2.COMMENT as COLUMN_DESCRIPTION,
COLUMNS_V2.TYPE_NAME AS DATA_TYPE
COLUMNS_V2.*
FROM DBS
JOIN TBLS
ON DBS.DB_ID = TBLS.DB_ID
JOIN SDS
ON TBLS.SD_ID = SDS.SD_ID
JOIN COLUMNS_V2
ON COLUMNS_V2.CD_ID = SDS.CD_ID
JOIN (SELECT DISTINCT
TBL_ID,
TBL_COMMENT
FROM (SELECT TBLS.TBL_ID TBL_ID,
TABLE_PARAMS.PARAM_KEY,
TABLE_PARAMS.PARAM_VALUE,
CASE WHEN TABLE_PARAMS.PARAM_KEY = 'comment'
THEN TABLE_PARAMS.PARAM_VALUE
ELSE ''
END TBL_COMMENT
FROM TBLS
JOIN TABLE_PARAMS
ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID) TBL_COMMENTS_INTERNAL) TBL_COMMENTS
ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID;
Created on 07-24-2018 11:08 PM - edited 08-17-2019 09:55 PM
Hi Michael,
For table comments, there is a stackoverflow article around it. For column comments, you can simply run the hive command 'DESCRIBE tablename;', and you should see a comment column in the results.
The easiest way would be using Apache Atlas, if you have Atlas installed, you should be able to see all the table/column metadata, including comments in the Atlas UI.
For instance, I added some comments to an existing Hive table, called 'airline_tweets'
Then I am able to see the table/column comments in Atlas
Atlas is great tool to use for enterprise data governance. Hope that helps!
If you found it resolved the issue, please "accept" the answer, thanks.
Created 07-25-2018 02:51 PM
@dsun thanks for the response.
Are the table/column comments stored somewhere in HIVE Metastore? I already know about the describe command and Atlas. I need to extract the table/column comments into a table/file, not simply view them on screen. I hope this makes sense.
Does anyone else know how to query table/column comments using HIVE Metastore?
Thanks.
Created on 07-25-2018 04:26 PM - edited 08-17-2019 09:54 PM
Hi Michael,
As mentioned in that stackoverflow question, we will need to query the underlying RDBMS that Hive Metastore sits on. For instance, in my demo environment, I use MySQL as the back Hive Metastore storage. I will need to connect to that MySql host and query the Hive metadata.
For demonstration purpose, let's add some comments at both table and column level, table name 'airline_tweets' for instance.
Add a comment at the table level:
ALTER TABLE airline_tweets SET TBLPROPERTIES ('comment' = 'Airline Tweets');
Add a comment to column 'tweet_id'
ALTER TABLE airline_tweets CHANGE tweet_id tweet_id STRING COMMENT 'This is column tweet_id';
Now let's connect to the MySQL host as mentioned above, I'm connecting as the root user, you can also use user 'hive'.
[root@scregionm2 ~]# mysql -u root mysql> use hive; mysql> show tables; +---------------------------+ | Tables_in_hive | +---------------------------+ | AUX_TABLE | | BUCKETING_COLS | | CDS | | COLUMNS_V2 | | COMPACTION_QUEUE | | COMPLETED_COMPACTIONS | | COMPLETED_TXN_COMPONENTS | | DATABASE_PARAMS | | DBS | | DB_PRIVS | | DELEGATION_TOKENS | | FUNCS | | FUNC_RU | | GLOBAL_PRIVS | | HIVE_LOCKS | | IDXS | | INDEX_PARAMS | | KEY_CONSTRAINTS | | MASTER_KEYS | | NEXT_COMPACTION_QUEUE_ID | | NEXT_LOCK_ID | | NEXT_TXN_ID | | NOTIFICATION_LOG | | NOTIFICATION_SEQUENCE | | NUCLEUS_TABLES | | PARTITIONS | | PARTITION_EVENTS | | PARTITION_KEYS | | PARTITION_KEY_VALS | | PARTITION_PARAMS | | PART_COL_PRIVS | | PART_COL_STATS | | PART_PRIVS | | ROLES | | ROLE_MAP | | SDS | | SD_PARAMS | | SEQUENCE_TABLE | | SERDES | | SERDE_PARAMS | | SKEWED_COL_NAMES | | SKEWED_COL_VALUE_LOC_MAP | | SKEWED_STRING_LIST | | SKEWED_STRING_LIST_VALUES | | SKEWED_VALUES | | SORT_COLS | | TABLE_PARAMS | | TAB_COL_STATS | | TBLS | | TBL_COL_PRIVS | | TBL_PRIVS | | TXNS | | TXN_COMPONENTS | | TYPES | | TYPE_FIELDS | | VERSION | | WRITE_SET | +---------------------------+ 57 rows in set (0.00 sec)
As you can see all the Hive metadata are stored in those tables. Besides using the 'describe' command in Hive, you can also retrieve the column comments from the Hive metastore database, below is the SQL command I used to query the column comment we just added:
mysql> SELECT c.* FROM hive.TBLS t JOIN hive.DBS d ON t.DB_ID = d.DB_ID JOIN hive.SDS s ON t.SD_ID = s.SD_ID JOIN hive.COLUMNS_V2 c ON s.CD_ID = c.CD_ID WHERE TBL_NAME = 'airline_tweets' AND d.NAME= 'default' ORDER by INTEGER_IDX; +-------+-------------------------+------------------------------+-----------+-------------+ | CD_ID | COMMENT | COLUMN_NAME | TYPE_NAME | INTEGER_IDX | +-------+-------------------------+------------------------------+-----------+-------------+ | 141 | This is column tweet_id | tweet_id | string | 0 | | 141 | NULL | airline_sentiment | string | 1 | | 141 | NULL | airline_sentiment_confidence | string | 2 | | 141 | NULL | negativereason | string | 3 | | 141 | NULL | negativereason_confidence | string | 4 | | 141 | NULL | airline | string | 5 | | 141 | NULL | airline_sentiment_gold | string | 6 | | 141 | NULL | name | string | 7 | | 141 | NULL | negativereason_gold | string | 8 | | 141 | NULL | retweet_count | string | 9 | | 141 | NULL | text | string | 10 | | 141 | NULL | tweet_coord | string | 11 | | 141 | NULL | tweet_created | string | 12 | | 141 | NULL | tweet_location | string | 13 | | 141 | NULL | user_timezone | string | 14 | +-------+-------------------------+------------------------------+-----------+-------------+
And here is the SQL command for retrieving the table comment we just added:
mysql> SELECT d.* FROM hive.TBLS t JOIN hive.TABLE_PARAMS d ON t.TBL_ID = d.TBL_ID WHERE TBL_NAME = 'airline_tweets'; +--------+-----------------------+----------------+ | TBL_ID | PARAM_KEY | PARAM_VALUE | +--------+-----------------------+----------------+ | 1 | comment | Airline Tweets | | 1 | last_modified_by | dsun | | 1 | last_modified_time | 1532533100 | | 1 | numFiles | 0 | | 1 | numRows | 14872 | | 1 | rawDataSize | 21569185 | | 1 | totalSize | 0 | | 1 | transient_lastDdlTime | 1532533100 | +--------+-----------------------+----------------+ 8 rows in set (0.00 sec)
You can create your own SQL command if you need to retrieve commends for table & columns at the same time, all the tables are there.
Please 'Accept' the answer if you found it resolved your question.
Derek
Created 07-25-2018 04:43 PM
Your response is a bit overcomplicated. I resolved my own question. Below is a type of a query from HIVE Metastore I was looking for:
SELECT DBS.NAME AS OWNER,
TBLS.TBL_NAME as OBJECT_NAME,
TBL_COMMENTS.TBL_COMMENT as OBJECT_DESCRIPTION,
TBLS.TBL_ID as OBJECT_ID,
TBLS.TBL_TYPE as OBJECT_TYPE,
'VALID' as OBJECT_STATUS,
COLUMNS_V2.COLUMN_NAME,
COLUMNS_V2.COMMENT as COLUMN_DESCRIPTION,
COLUMNS_V2.TYPE_NAME AS DATA_TYPE
COLUMNS_V2.*
FROM DBS
JOIN TBLS
ON DBS.DB_ID = TBLS.DB_ID
JOIN SDS
ON TBLS.SD_ID = SDS.SD_ID
JOIN COLUMNS_V2
ON COLUMNS_V2.CD_ID = SDS.CD_ID
JOIN (SELECT DISTINCT
TBL_ID,
TBL_COMMENT
FROM (SELECT TBLS.TBL_ID TBL_ID,
TABLE_PARAMS.PARAM_KEY,
TABLE_PARAMS.PARAM_VALUE,
CASE WHEN TABLE_PARAMS.PARAM_KEY = 'comment'
THEN TABLE_PARAMS.PARAM_VALUE
ELSE ''
END TBL_COMMENT
FROM TBLS
JOIN TABLE_PARAMS
ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID) TBL_COMMENTS_INTERNAL) TBL_COMMENTS
ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID;
Created on 06-28-2020 07:21 AM - edited 06-28-2020 07:37 AM
I tried your query, but if the table has no comment, it produces a duplicate record for that table. So I modified it a bit
mysql -u hive -p
<ENTER YOUR HIVE PASSWORD>
use metastore;
SELECT * FROM (SELECT DBS.NAME AS OWNER, TBLS.TBL_NAME as OBJECT_NAME, TBL_COMMENTS.TBL_COMMENT as OBJECT_DESCRIPTION, TBLS.TBL_ID as OBJECT_ID, TBLS.TBL_TYPE as OBJECT_TYPE, "VALID" as OBJECT_STATUS,COLUMNS_V2.COLUMN_NAME, COLUMNS_V2.COMMENT as COLUMN_DESCRIPTION, COLUMNS_V2.TYPE_NAME AS DATA_TYPE FROM DBS JOIN TBLS ON DBS.DB_ID = TBLS.DB_ID JOIN SDS ON TBLS.SD_ID = SDS.SD_ID JOIN COLUMNS_V2 ON COLUMNS_V2.CD_ID = SDS.CD_ID JOIN ( SELECT DISTINCT TBL_ID, TBL_COMMENT FROM ( SELECT TBLS.TBL_ID TBL_ID,TABLE_PARAMS.PARAM_KEY,TABLE_PARAMS.PARAM_VALUE, TABLE_PARAMS.PARAM_VALUE as TBL_COMMENT FROM TBLS JOIN TABLE_PARAMS ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID WHERE TABLE_PARAMS.PARAM_KEY = "comment" UNION ALL SELECT TBLS.TBL_ID TBL_ID,TABLE_PARAMS.PARAM_KEY,TABLE_PARAMS.PARAM_VALUE, "" as TBL_COMMENT FROM TBLS JOIN TABLE_PARAMS ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID WHERE TABLE_PARAMS.PARAM_KEY <> "comment" AND TBLS.TBL_ID NOT IN (SELECT TBL_ID FROM TABLE_PARAMS WHERE TABLE_PARAMS.PARAM_KEY = "comment") ) TBL_COMMENTS_INTERNAL) TBL_COMMENTS ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID) as view WHERE OWNER = "database_name_goes_here" AND OBJECT_NAME = "table_name_goes_here";