<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Large hive metastore db size when using streaming API in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Large-hive-metastore-db-size-when-using-streaming-API/m-p/37452#M19494</link>
    <description>&lt;P&gt;Hi,&amp;nbsp;I'm using Hive Streaming API to write data to hive. Recently I looked into the metastore db I found that the tables of&amp;nbsp;COMPLETED_TXN_COMPONENTS,&amp;nbsp;TXNS,&amp;nbsp;TXN_COMPONENTS took large of data size, especially&amp;nbsp;&lt;SPAN&gt;COMPLETED_TXN_COMPONENTS took almost 3GB.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm concerning the increasing sizes of these tables, could anyone tole me what are they about?&lt;/P&gt;&lt;P&gt;I looked into the data in&amp;nbsp;&lt;SPAN&gt;COMPLETED_TXN_COMPONENTS, they don't seem meanful rather then records of used transaction id.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;1. Is it safe to&amp;nbsp;clear these tables?&lt;/P&gt;&lt;P&gt;2. If I migrate data from one Hive cluster to another one, do I have to keep these 3 tables&amp;nbsp;identical with the metastore db in the new cluster?&lt;/P&gt;</description>
    <pubDate>Fri, 16 Sep 2022 10:04:09 GMT</pubDate>
    <dc:creator>Hef</dc:creator>
    <dc:date>2022-09-16T10:04:09Z</dc:date>
    <item>
      <title>Large hive metastore db size when using streaming API</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Large-hive-metastore-db-size-when-using-streaming-API/m-p/37452#M19494</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;I'm using Hive Streaming API to write data to hive. Recently I looked into the metastore db I found that the tables of&amp;nbsp;COMPLETED_TXN_COMPONENTS,&amp;nbsp;TXNS,&amp;nbsp;TXN_COMPONENTS took large of data size, especially&amp;nbsp;&lt;SPAN&gt;COMPLETED_TXN_COMPONENTS took almost 3GB.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm concerning the increasing sizes of these tables, could anyone tole me what are they about?&lt;/P&gt;&lt;P&gt;I looked into the data in&amp;nbsp;&lt;SPAN&gt;COMPLETED_TXN_COMPONENTS, they don't seem meanful rather then records of used transaction id.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;1. Is it safe to&amp;nbsp;clear these tables?&lt;/P&gt;&lt;P&gt;2. If I migrate data from one Hive cluster to another one, do I have to keep these 3 tables&amp;nbsp;identical with the metastore db in the new cluster?&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:04:09 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Large-hive-metastore-db-size-when-using-streaming-API/m-p/37452#M19494</guid>
      <dc:creator>Hef</dc:creator>
      <dc:date>2022-09-16T10:04:09Z</dc:date>
    </item>
    <item>
      <title>Re: Large hive metastore db size when using streaming API</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Large-hive-metastore-db-size-when-using-streaming-API/m-p/38001#M19495</link>
      <description>The Hive "Streaming" feature is built upon its unsupported [1] transactional features: &lt;A href="https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest&lt;/A&gt;&lt;BR /&gt;&lt;BR /&gt;This feature (the ACID one) uses the tables you've mentioned, when DbTxnManager is in use as per the suggested configs.&lt;BR /&gt;&lt;BR /&gt;Cloudera does not recommend the use of ACID features currently, because it is experimental in stability/quality upstream [1].&lt;BR /&gt;&lt;BR /&gt;But anyways, checking some code [2] if all data is compacted in your table then the entries under COMPLETED_TXN_COMPONENTS should be deleted away. Do you see any messages such as "Unable to delete compaction record" in your HMS log? Or any WARN+ log from CompactionTxnHandler class in general? Looking for that and then working over the error should help you solve this.&lt;BR /&gt;&lt;BR /&gt;[1] - &lt;A href="http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_rn_hive_ki.html," target="_blank"&gt;http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_rn_hive_ki.html,&lt;/A&gt; specific quote:&lt;BR /&gt;"""&lt;BR /&gt;Hive ACID is not supported&lt;BR /&gt;Hive ACID is an experimental feature and Cloudera does not currently support it.&lt;BR /&gt;"""&lt;BR /&gt;[2] - &lt;A href="https://github.com/cloudera/hive/blob/cdh5.5.2-release/metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java#L320," target="_blank"&gt;https://github.com/cloudera/hive/blob/cdh5.5.2-release/metastore/src/java/org/apache/hadoop/hive/metastore/txn/CompactionTxnHandler.java#L320,&lt;/A&gt; etc.</description>
      <pubDate>Sun, 28 Feb 2016 09:29:13 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Large-hive-metastore-db-size-when-using-streaming-API/m-p/38001#M19495</guid>
      <dc:creator>Harsh J</dc:creator>
      <dc:date>2016-02-28T09:29:13Z</dc:date>
    </item>
  </channel>
</rss>

