- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Large hive metastore db size when using streaming API
- Labels:
-
Apache Hive
Created on ‎02-14-2016 10:53 PM - edited ‎09-16-2022 03:04 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, I'm using Hive Streaming API to write data to hive. Recently I looked into the metastore db I found that the tables of COMPLETED_TXN_COMPONENTS, TXNS, TXN_COMPONENTS took large of data size, especially COMPLETED_TXN_COMPONENTS took almost 3GB.
I'm concerning the increasing sizes of these tables, could anyone tole me what are they about?
I looked into the data in COMPLETED_TXN_COMPONENTS, they don't seem meanful rather then records of used transaction id.
1. Is it safe to clear these tables?
2. If I migrate data from one Hive cluster to another one, do I have to keep these 3 tables identical with the metastore db in the new cluster?
Created ‎02-28-2016 01:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This feature (the ACID one) uses the tables you've mentioned, when DbTxnManager is in use as per the suggested configs.
Cloudera does not recommend the use of ACID features currently, because it is experimental in stability/quality upstream [1].
But anyways, checking some code [2] if all data is compacted in your table then the entries under COMPLETED_TXN_COMPONENTS should be deleted away. Do you see any messages such as "Unable to delete compaction record" in your HMS log? Or any WARN+ log from CompactionTxnHandler class in general? Looking for that and then working over the error should help you solve this.
[1] - http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_rn_hive_ki.html, specific quote:
"""
Hive ACID is not supported
Hive ACID is an experimental feature and Cloudera does not currently support it.
"""
[2] - https://github.com/cloudera/hive/blob/cdh5.5.2-release/metastore/src/java/org/apache/hadoop/hive/met... etc.
Created ‎02-28-2016 01:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This feature (the ACID one) uses the tables you've mentioned, when DbTxnManager is in use as per the suggested configs.
Cloudera does not recommend the use of ACID features currently, because it is experimental in stability/quality upstream [1].
But anyways, checking some code [2] if all data is compacted in your table then the entries under COMPLETED_TXN_COMPONENTS should be deleted away. Do you see any messages such as "Unable to delete compaction record" in your HMS log? Or any WARN+ log from CompactionTxnHandler class in general? Looking for that and then working over the error should help you solve this.
[1] - http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_rn_hive_ki.html, specific quote:
"""
Hive ACID is not supported
Hive ACID is an experimental feature and Cloudera does not currently support it.
"""
[2] - https://github.com/cloudera/hive/blob/cdh5.5.2-release/metastore/src/java/org/apache/hadoop/hive/met... etc.
