Support Questions

Find answers, ask questions, and share your expertise

How to query table/column comments in HIVE Metastore?

avatar
Contributor

I am trying to figure out how to query table and column comments (descriptions) in HIVE Metastore. If there is a way to add the comments to HIVE tables/columns, shouldn't be a way to query/report on the comments from HIVE Metastore?

Please help 🙂

Best Regards, Michael

1 ACCEPTED SOLUTION

avatar
Contributor

Your response is a bit overcomplicated. I resolved my own question. Below is a type of a query from HIVE Metastore I was looking for:

SELECT DBS.NAME AS OWNER,
TBLS.TBL_NAME as OBJECT_NAME,
TBL_COMMENTS.TBL_COMMENT as OBJECT_DESCRIPTION,
TBLS.TBL_ID as OBJECT_ID,
TBLS.TBL_TYPE as OBJECT_TYPE,
'VALID' as OBJECT_STATUS,
COLUMNS_V2.COLUMN_NAME,
COLUMNS_V2.COMMENT as COLUMN_DESCRIPTION,
COLUMNS_V2.TYPE_NAME AS DATA_TYPE
COLUMNS_V2.*
FROM DBS
JOIN TBLS
ON DBS.DB_ID = TBLS.DB_ID
JOIN SDS
ON TBLS.SD_ID = SDS.SD_ID
JOIN COLUMNS_V2
ON COLUMNS_V2.CD_ID = SDS.CD_ID
JOIN (SELECT DISTINCT
TBL_ID,
TBL_COMMENT
FROM (SELECT TBLS.TBL_ID TBL_ID,
TABLE_PARAMS.PARAM_KEY,
TABLE_PARAMS.PARAM_VALUE,
CASE WHEN TABLE_PARAMS.PARAM_KEY = 'comment'
THEN TABLE_PARAMS.PARAM_VALUE
ELSE ''
END TBL_COMMENT
FROM TBLS
JOIN TABLE_PARAMS
ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID) TBL_COMMENTS_INTERNAL) TBL_COMMENTS
ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID;

View solution in original post

5 REPLIES 5

avatar
Expert Contributor

Hi Michael,

For table comments, there is a stackoverflow article around it. For column comments, you can simply run the hive command 'DESCRIBE tablename;', and you should see a comment column in the results.

The easiest way would be using Apache Atlas, if you have Atlas installed, you should be able to see all the table/column metadata, including comments in the Atlas UI.

For instance, I added some comments to an existing Hive table, called 'airline_tweets'

83408-addtablecomment.png

Then I am able to see the table/column comments in Atlas

83409-atlas1.png

Atlas is great tool to use for enterprise data governance. Hope that helps!

If you found it resolved the issue, please "accept" the answer, thanks.

avatar
Contributor

@dsun thanks for the response.

Are the table/column comments stored somewhere in HIVE Metastore? I already know about the describe command and Atlas. I need to extract the table/column comments into a table/file, not simply view them on screen. I hope this makes sense.

Does anyone else know how to query table/column comments using HIVE Metastore?

Thanks.

avatar
Expert Contributor

Hi Michael,

As mentioned in that stackoverflow question, we will need to query the underlying RDBMS that Hive Metastore sits on. For instance, in my demo environment, I use MySQL as the back Hive Metastore storage. I will need to connect to that MySql host and query the Hive metadata.

83425-metastorerdmbshost.png

For demonstration purpose, let's add some comments at both table and column level, table name 'airline_tweets' for instance.

Add a comment at the table level:

ALTER TABLE airline_tweets SET TBLPROPERTIES ('comment' = 'Airline Tweets');

Add a comment to column 'tweet_id'

ALTER TABLE airline_tweets CHANGE tweet_id tweet_id STRING COMMENT 'This is column tweet_id';

Now let's connect to the MySQL host as mentioned above, I'm connecting as the root user, you can also use user 'hive'.

[root@scregionm2 ~]# mysql -u root
mysql> use hive;
mysql> show tables;
+---------------------------+
| Tables_in_hive            |
+---------------------------+
| AUX_TABLE                 |
| BUCKETING_COLS            |
| CDS                       |
| COLUMNS_V2                |
| COMPACTION_QUEUE          |
| COMPLETED_COMPACTIONS     |
| COMPLETED_TXN_COMPONENTS  |
| DATABASE_PARAMS           |
| DBS                       |
| DB_PRIVS                  |
| DELEGATION_TOKENS         |
| FUNCS                     |
| FUNC_RU                   |
| GLOBAL_PRIVS              |
| HIVE_LOCKS                |
| IDXS                      |
| INDEX_PARAMS              |
| KEY_CONSTRAINTS           |
| MASTER_KEYS               |
| NEXT_COMPACTION_QUEUE_ID  |
| NEXT_LOCK_ID              |
| NEXT_TXN_ID               |
| NOTIFICATION_LOG          |
| NOTIFICATION_SEQUENCE     |
| NUCLEUS_TABLES            |
| PARTITIONS                |
| PARTITION_EVENTS          |
| PARTITION_KEYS            |
| PARTITION_KEY_VALS        |
| PARTITION_PARAMS          |
| PART_COL_PRIVS            |
| PART_COL_STATS            |
| PART_PRIVS                |
| ROLES                     |
| ROLE_MAP                  |
| SDS                       |
| SD_PARAMS                 |
| SEQUENCE_TABLE            |
| SERDES                    |
| SERDE_PARAMS              |
| SKEWED_COL_NAMES          |
| SKEWED_COL_VALUE_LOC_MAP  |
| SKEWED_STRING_LIST        |
| SKEWED_STRING_LIST_VALUES |
| SKEWED_VALUES             |
| SORT_COLS                 |
| TABLE_PARAMS              |
| TAB_COL_STATS             |
| TBLS                      |
| TBL_COL_PRIVS             |
| TBL_PRIVS                 |
| TXNS                      |
| TXN_COMPONENTS            |
| TYPES                     |
| TYPE_FIELDS               |
| VERSION                   |
| WRITE_SET                 |
+---------------------------+
57 rows in set (0.00 sec)

As you can see all the Hive metadata are stored in those tables. Besides using the 'describe' command in Hive, you can also retrieve the column comments from the Hive metastore database, below is the SQL command I used to query the column comment we just added:

mysql> SELECT c.* FROM hive.TBLS t  JOIN hive.DBS d  ON t.DB_ID = d.DB_ID  JOIN hive.SDS s  ON t.SD_ID = s.SD_ID  JOIN hive.COLUMNS_V2 c  ON s.CD_ID = c.CD_ID  WHERE TBL_NAME = 'airline_tweets' AND d.NAME= 'default'  ORDER by INTEGER_IDX;
+-------+-------------------------+------------------------------+-----------+-------------+
| CD_ID | COMMENT                 | COLUMN_NAME                  | TYPE_NAME | INTEGER_IDX |
+-------+-------------------------+------------------------------+-----------+-------------+
|   141 | This is column tweet_id | tweet_id                     | string    |           0 |
|   141 | NULL                    | airline_sentiment            | string    |           1 |
|   141 | NULL                    | airline_sentiment_confidence | string    |           2 |
|   141 | NULL                    | negativereason               | string    |           3 |
|   141 | NULL                    | negativereason_confidence    | string    |           4 |
|   141 | NULL                    | airline                      | string    |           5 |
|   141 | NULL                    | airline_sentiment_gold       | string    |           6 |
|   141 | NULL                    | name                         | string    |           7 |
|   141 | NULL                    | negativereason_gold          | string    |           8 |
|   141 | NULL                    | retweet_count                | string    |           9 |
|   141 | NULL                    | text                         | string    |          10 |
|   141 | NULL                    | tweet_coord                  | string    |          11 |
|   141 | NULL                    | tweet_created                | string    |          12 |
|   141 | NULL                    | tweet_location               | string    |          13 |
|   141 | NULL                    | user_timezone                | string    |          14 |
+-------+-------------------------+------------------------------+-----------+-------------+

And here is the SQL command for retrieving the table comment we just added:

mysql> SELECT d.* FROM hive.TBLS t  JOIN hive.TABLE_PARAMS d  ON t.TBL_ID = d.TBL_ID  WHERE TBL_NAME = 'airline_tweets';
+--------+-----------------------+----------------+
| TBL_ID | PARAM_KEY             | PARAM_VALUE    |
+--------+-----------------------+----------------+
|      1 | comment               | Airline Tweets |
|      1 | last_modified_by      | dsun           |
|      1 | last_modified_time    | 1532533100     |
|      1 | numFiles              | 0              |
|      1 | numRows               | 14872          |
|      1 | rawDataSize           | 21569185       |
|      1 | totalSize             | 0              |
|      1 | transient_lastDdlTime | 1532533100     |
+--------+-----------------------+----------------+
8 rows in set (0.00 sec)

You can create your own SQL command if you need to retrieve commends for table & columns at the same time, all the tables are there.

Please 'Accept' the answer if you found it resolved your question.

Derek

avatar
Contributor

Your response is a bit overcomplicated. I resolved my own question. Below is a type of a query from HIVE Metastore I was looking for:

SELECT DBS.NAME AS OWNER,
TBLS.TBL_NAME as OBJECT_NAME,
TBL_COMMENTS.TBL_COMMENT as OBJECT_DESCRIPTION,
TBLS.TBL_ID as OBJECT_ID,
TBLS.TBL_TYPE as OBJECT_TYPE,
'VALID' as OBJECT_STATUS,
COLUMNS_V2.COLUMN_NAME,
COLUMNS_V2.COMMENT as COLUMN_DESCRIPTION,
COLUMNS_V2.TYPE_NAME AS DATA_TYPE
COLUMNS_V2.*
FROM DBS
JOIN TBLS
ON DBS.DB_ID = TBLS.DB_ID
JOIN SDS
ON TBLS.SD_ID = SDS.SD_ID
JOIN COLUMNS_V2
ON COLUMNS_V2.CD_ID = SDS.CD_ID
JOIN (SELECT DISTINCT
TBL_ID,
TBL_COMMENT
FROM (SELECT TBLS.TBL_ID TBL_ID,
TABLE_PARAMS.PARAM_KEY,
TABLE_PARAMS.PARAM_VALUE,
CASE WHEN TABLE_PARAMS.PARAM_KEY = 'comment'
THEN TABLE_PARAMS.PARAM_VALUE
ELSE ''
END TBL_COMMENT
FROM TBLS
JOIN TABLE_PARAMS
ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID) TBL_COMMENTS_INTERNAL) TBL_COMMENTS
ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID;

avatar
Contributor

I tried your query, but if the table has no comment, it produces a duplicate record for that table. So I modified it a bit

 

mysql -u hive -p
<ENTER YOUR HIVE PASSWORD>
use metastore;

SELECT * FROM (SELECT DBS.NAME AS OWNER, TBLS.TBL_NAME as OBJECT_NAME, TBL_COMMENTS.TBL_COMMENT as OBJECT_DESCRIPTION, TBLS.TBL_ID as OBJECT_ID, TBLS.TBL_TYPE as OBJECT_TYPE, "VALID" as OBJECT_STATUS,COLUMNS_V2.COLUMN_NAME, COLUMNS_V2.COMMENT as COLUMN_DESCRIPTION, COLUMNS_V2.TYPE_NAME AS DATA_TYPE FROM DBS JOIN TBLS ON DBS.DB_ID = TBLS.DB_ID JOIN SDS ON TBLS.SD_ID = SDS.SD_ID JOIN COLUMNS_V2 ON COLUMNS_V2.CD_ID = SDS.CD_ID JOIN ( SELECT DISTINCT TBL_ID, TBL_COMMENT FROM ( SELECT TBLS.TBL_ID TBL_ID,TABLE_PARAMS.PARAM_KEY,TABLE_PARAMS.PARAM_VALUE, TABLE_PARAMS.PARAM_VALUE as TBL_COMMENT FROM TBLS JOIN TABLE_PARAMS ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID WHERE TABLE_PARAMS.PARAM_KEY = "comment" UNION ALL SELECT TBLS.TBL_ID TBL_ID,TABLE_PARAMS.PARAM_KEY,TABLE_PARAMS.PARAM_VALUE, "" as TBL_COMMENT FROM TBLS JOIN TABLE_PARAMS ON TBLS.TBL_ID = TABLE_PARAMS.TBL_ID WHERE TABLE_PARAMS.PARAM_KEY <> "comment" AND TBLS.TBL_ID NOT IN (SELECT TBL_ID FROM TABLE_PARAMS WHERE TABLE_PARAMS.PARAM_KEY = "comment") ) TBL_COMMENTS_INTERNAL) TBL_COMMENTS ON TBLS.TBL_ID = TBL_COMMENTS.TBL_ID) as view WHERE OWNER = "database_name_goes_here" AND OBJECT_NAME = "table_name_goes_here";