<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Can we check size of Hive tables? If so - how? in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49593#M4978</link>
    <description>I was wondering if stats were needed to have describe extended output the actual file size. I recall something like that.&lt;BR /&gt;&lt;BR /&gt;</description>
    <pubDate>Wed, 18 Jan 2017 06:59:47 GMT</pubDate>
    <dc:creator>mbigelow</dc:creator>
    <dc:date>2017-01-18T06:59:47Z</dc:date>
    <item>
      <title>Can we check size of Hive tables? If so - how?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49427#M4971</link>
      <description>&lt;P&gt;I have many tables in Hive and suspect size of these tables are causing space issues on HDFS FS. Is there a way to check the size of Hive tables? If so - how?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;hive&amp;gt; show tables;&lt;BR /&gt;OK&lt;BR /&gt;bee_actions&lt;BR /&gt;bee_bills&lt;BR /&gt;bee_charges&lt;BR /&gt;bee_cpc_notifs&lt;BR /&gt;bee_customers&lt;BR /&gt;bee_interactions&lt;BR /&gt;bee_master_03jun2016_to_17oct2016&lt;BR /&gt;bee_master_18may2016_to_02jun2016&lt;BR /&gt;bee_master_18oct2016_to_21dec2016&lt;BR /&gt;bee_master_20160614_021501&lt;BR /&gt;bee_master_20160615_010001&lt;BR /&gt;bee_master_20160616_010001&lt;BR /&gt;bee_master_20160617_010001&lt;BR /&gt;bee_master_20160618_010001&lt;BR /&gt;bee_master_20160619_010001&lt;BR /&gt;bee_master_20160620_010001&lt;BR /&gt;bee_master_20160621_010002&lt;BR /&gt;bee_master_20160622_010001&lt;BR /&gt;bee_master_20160623_010001&lt;BR /&gt;bee_master_20160624_065545&lt;BR /&gt;bee_master_20160625_010001&lt;BR /&gt;bee_master_20160626_010001&lt;BR /&gt;bee_master_20160627_010001&lt;BR /&gt;bee_master_20160628_010001&lt;BR /&gt;bee_master_20160629_010001&lt;BR /&gt;bee_master_20160630_010001&lt;BR /&gt;bee_master_20160701_010001&lt;BR /&gt;bee_master_20160702_010001&lt;BR /&gt;bee_master_20160703_010001&lt;BR /&gt;bee_master_20160704_010001&lt;BR /&gt;bee_master_20160705_010001&lt;BR /&gt;bee_master_20160706_010001&lt;BR /&gt;bee_master_20160707_010001&lt;BR /&gt;bee_master_20160707_040048&lt;BR /&gt;bee_master_20160708_010001&lt;BR /&gt;bee_master_20160709_010001&lt;BR /&gt;bee_master_20160710_010001&lt;BR /&gt;bee_master_20160711_010001&lt;BR /&gt;bee_master_20160712_010001&lt;BR /&gt;bee_master_20160713_010001&lt;BR /&gt;bee_master_20160714_010001&lt;BR /&gt;bee_master_20160715_010002&lt;BR /&gt;bee_master_20160716_010001&lt;BR /&gt;bee_master_20160717_010001&lt;BR /&gt;bee_master_20160718_010001&lt;BR /&gt;bee_master_20160720_010001&lt;BR /&gt;bee_master_20160721_010001&lt;BR /&gt;bee_master_20160723_010002&lt;BR /&gt;bee_master_20160724_010001&lt;BR /&gt;bee_master_20160725_010001&lt;BR /&gt;bee_master_20160726_010001&lt;BR /&gt;bee_master_20160727_010002&lt;BR /&gt;bee_master_20160728_010001&lt;BR /&gt;bee_master_20160729_010001&lt;BR /&gt;bee_master_20160730_010001&lt;BR /&gt;bee_master_20160731_010001&lt;BR /&gt;bee_master_20160801_010001&lt;BR /&gt;bee_master_20160802_010001&lt;BR /&gt;bee_master_20160803_010001&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 10:54:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49427#M4971</guid>
      <dc:creator>Anks2411</dc:creator>
      <dc:date>2022-09-16T10:54:43Z</dc:date>
    </item>
    <item>
      <title>Re: Can we check size of Hive tables? If so - how?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49430#M4972</link>
      <description>describe formatted/extended &amp;lt;table&amp;gt; partition &amp;lt;partition spec&amp;gt;&lt;BR /&gt;&lt;BR /&gt;This will output stats like totalNumberFiles, totalFileSize, maxFileSize, minFileSize, lastAccessTime, and lastUpdateTime.&lt;BR /&gt;&lt;BR /&gt;So not exactly this table is X size. It would seem that if you include the partition it will give you a raw data size.&lt;BR /&gt;&lt;BR /&gt;Otherwise, hdfs dfs -du -s -h /path/to/table will do.&lt;BR /&gt;&lt;BR /&gt;&lt;A href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe" target="_blank"&gt;https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe&lt;/A&gt;</description>
      <pubDate>Fri, 13 Jan 2017 21:40:55 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49430#M4972</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-01-13T21:40:55Z</dc:date>
    </item>
    <item>
      <title>Re: Can we check size of Hive tables? If so - how?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49433#M4973</link>
      <description>&lt;P&gt;Thanks so much for your prompt reply.&lt;/P&gt;&lt;P&gt;I ran the suggested command but i see size as 0 whereas i know it has some data. So what does that mean?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;hive&amp;gt; describe extended bee_master_20170113_010001&lt;BR /&gt;&amp;gt; ;&lt;BR /&gt;OK&lt;BR /&gt;entity_id string&lt;BR /&gt;account_id string&lt;BR /&gt;bill_cycle string&lt;BR /&gt;entity_type string&lt;BR /&gt;col1 string&lt;BR /&gt;col2 string&lt;BR /&gt;col3 string&lt;BR /&gt;col4 string&lt;BR /&gt;col5 string&lt;BR /&gt;col6 string&lt;BR /&gt;col7 string&lt;BR /&gt;col8 string&lt;BR /&gt;col9 string&lt;BR /&gt;col10 string&lt;BR /&gt;col11 string&lt;BR /&gt;col12 string&lt;/P&gt;&lt;P&gt;Detailed Table Information Table(tableName:bee_master_20170113_010001, dbName:default, owner:sagarpa, createTime:1484297904, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:entity_id, type:string, comment:null), FieldSchema(name:account_id, type:string, comment:null), FieldSchema(name:bill_cycle, type:string, comment:null), FieldSchema(name:entity_type, type:string, comment:null), FieldSchema(name:col1, type:string, comment:null), FieldSchema(name:col2, type:string, comment:null), FieldSchema(name:col3, type:string, comment:null), FieldSchema(name:col4, type:string, comment:null), FieldSchema(name:col5, type:string, comment:null), FieldSchema(name:col6, type:string, comment:null), FieldSchema(name:col7, type:string, comment:null), FieldSchema(name:col8, type:string, comment:null), FieldSchema(name:col9, type:string, comment:null), FieldSchema(name:col10, type:string, comment:null), FieldSchema(name:col11, type:string, comment:null), FieldSchema(name:col12, type:string, comment:null)], location:hdfs://cmilcb521.amdocs.com:8020/user/insighte/bee_data/bee_run_20170113_010001, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format=&lt;BR /&gt;Time taken: 0.328 seconds, Fetched: 18 row(s)&lt;BR /&gt;hive&amp;gt; describe formatted bee_master_20170113_010001&lt;BR /&gt;&amp;gt; ;&lt;BR /&gt;OK&lt;BR /&gt;# col_name data_type comment&lt;/P&gt;&lt;P&gt;entity_id string&lt;BR /&gt;account_id string&lt;BR /&gt;bill_cycle string&lt;BR /&gt;entity_type string&lt;BR /&gt;col1 string&lt;BR /&gt;col2 string&lt;BR /&gt;col3 string&lt;BR /&gt;col4 string&lt;BR /&gt;col5 string&lt;BR /&gt;col6 string&lt;BR /&gt;col7 string&lt;BR /&gt;col8 string&lt;BR /&gt;col9 string&lt;BR /&gt;col10 string&lt;BR /&gt;col11 string&lt;BR /&gt;col12 string&lt;/P&gt;&lt;P&gt;# Detailed Table Information&lt;BR /&gt;Database: default&lt;BR /&gt;Owner: sagarpa&lt;BR /&gt;CreateTime: Fri Jan 13 02:58:24 CST 2017&lt;BR /&gt;LastAccessTime: UNKNOWN&lt;BR /&gt;Protect Mode: None&lt;BR /&gt;Retention: 0&lt;BR /&gt;Location: hdfs://cmilcb521.amdocs.com:8020/user/insighte/bee_data/bee_run_20170113_010001&lt;BR /&gt;Table Type: EXTERNAL_TABLE&lt;BR /&gt;Table Parameters:&lt;BR /&gt;COLUMN_STATS_ACCURATE false&lt;BR /&gt;EXTERNAL TRUE&lt;BR /&gt;numFiles 0&lt;BR /&gt;numRows -1&lt;BR /&gt;rawDataSize -1&lt;BR /&gt;totalSize 0&lt;BR /&gt;transient_lastDdlTime 1484297904&lt;/P&gt;&lt;P&gt;# Storage Information&lt;BR /&gt;SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe&lt;BR /&gt;InputFormat: org.apache.hadoop.mapred.TextInputFormat&lt;BR /&gt;OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat&lt;BR /&gt;Compressed: No&lt;BR /&gt;Num Buckets: -1&lt;BR /&gt;Bucket Columns: []&lt;BR /&gt;Sort Columns: []&lt;BR /&gt;Storage Desc Params:&lt;BR /&gt;field.delim \t&lt;BR /&gt;serialization.format \t&lt;BR /&gt;Time taken: 0.081 seconds, Fetched: 48 row(s)&lt;BR /&gt;hive&amp;gt; describe formatted bee_ppv;&lt;BR /&gt;OK&lt;BR /&gt;# col_name data_type comment&lt;/P&gt;&lt;P&gt;entity_id string&lt;BR /&gt;account_id string&lt;BR /&gt;bill_cycle string&lt;BR /&gt;ref_event string&lt;BR /&gt;amount double&lt;BR /&gt;ppv_category string&lt;BR /&gt;ppv_order_status string&lt;BR /&gt;ppv_order_date timestamp&lt;/P&gt;&lt;P&gt;# Detailed Table Information&lt;BR /&gt;Database: default&lt;BR /&gt;Owner: sagarpa&lt;BR /&gt;CreateTime: Thu Dec 22 12:56:34 CST 2016&lt;BR /&gt;LastAccessTime: UNKNOWN&lt;BR /&gt;Protect Mode: None&lt;BR /&gt;Retention: 0&lt;BR /&gt;Location: hdfs://cmilcb521.amdocs.com:8020/user/insighte/bee_data/tables/bee_ppv&lt;BR /&gt;Table Type: EXTERNAL_TABLE&lt;BR /&gt;Table Parameters:&lt;BR /&gt;COLUMN_STATS_ACCURATE true&lt;BR /&gt;EXTERNAL TRUE&lt;BR /&gt;numFiles 0&lt;BR /&gt;numRows 0&lt;BR /&gt;rawDataSize 0&lt;BR /&gt;totalSize 0&lt;BR /&gt;transient_lastDdlTime 1484340138&lt;/P&gt;&lt;P&gt;# Storage Information&lt;BR /&gt;SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe&lt;BR /&gt;InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat&lt;BR /&gt;OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat&lt;BR /&gt;Compressed: No&lt;BR /&gt;Num Buckets: -1&lt;BR /&gt;Bucket Columns: []&lt;BR /&gt;Sort Columns: []&lt;BR /&gt;Storage Desc Params:&lt;BR /&gt;field.delim \t&lt;BR /&gt;serialization.format \t&lt;BR /&gt;Time taken: 0.072 seconds, Fetched: 40 row(s)&lt;/P&gt;</description>
      <pubDate>Fri, 13 Jan 2017 22:07:49 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49433#M4973</guid>
      <dc:creator>Anks2411</dc:creator>
      <dc:date>2017-01-13T22:07:49Z</dc:date>
    </item>
    <item>
      <title>Re: Can we check size of Hive tables? If so - how?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49485#M4974</link>
      <description>What does hdfs dfs -du -s -h /path/to/table output?</description>
      <pubDate>Tue, 17 Jan 2017 04:48:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49485#M4974</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-01-17T04:48:42Z</dc:date>
    </item>
    <item>
      <title>Re: Can we check size of Hive tables? If so - how?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49541#M4975</link>
      <description>&lt;P&gt;i got the output. Thanks very much for all your help,&lt;/P&gt;</description>
      <pubDate>Tue, 17 Jan 2017 21:40:06 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49541#M4975</guid>
      <dc:creator>Anks2411</dc:creator>
      <dc:date>2017-01-17T21:40:06Z</dc:date>
    </item>
    <item>
      <title>Re: Can we check size of Hive tables? If so - how?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49563#M4976</link>
      <description>&lt;DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;&lt;SPAN class="cm-number"&gt;448&lt;/SPAN&gt; [&lt;SPAN class="cm-variable"&gt;GB&lt;/SPAN&gt;] &lt;SPAN class="cm-variable"&gt;hdfs&lt;/SPAN&gt;:&lt;SPAN class="cm-comment"&gt;//aewb-analytics-staging-name.example.com:8020/user/hive/warehouse/mybigtable&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class="CodeMirror-gutter-wrapper"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;&lt;SPAN class="cm-number"&gt;8&lt;/SPAN&gt; [&lt;SPAN class="cm-variable"&gt;GB&lt;/SPAN&gt;]&lt;SPAN class="cm-variable"&gt;hdfs&lt;/SPAN&gt;:&lt;SPAN class="cm-comment"&gt;//aewb-analytics-staging-name.example.com:8020/user/hive/warehouse/anotherone&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV class="CodeMirror-gutter-wrapper"&gt;&amp;nbsp;&lt;/DIV&gt;&lt;PRE&gt;&lt;SPAN&gt;&lt;SPAN class="cm-number"&gt;0&lt;/SPAN&gt; [&lt;SPAN class="cm-variable"&gt;GB&lt;/SPAN&gt;]&lt;SPAN class="cm-variable"&gt;hdfs&lt;/SPAN&gt;:&lt;SPAN class="cm-comment"&gt;//aewb-analytics-staging-name.example.com:8020/user/hive/warehouse/tinyone&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 18 Jan 2017 01:16:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49563#M4976</guid>
      <dc:creator>ZachRoes</dc:creator>
      <dc:date>2017-01-18T01:16:43Z</dc:date>
    </item>
    <item>
      <title>Re: Can we check size of Hive tables? If so - how?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49573#M4977</link>
      <description>This command should also help you get the size of HIVE table :&lt;BR /&gt;&lt;BR /&gt;ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan];</description>
      <pubDate>Wed, 18 Jan 2017 01:38:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49573#M4977</guid>
      <dc:creator>surajacharya</dc:creator>
      <dc:date>2017-01-18T01:38:23Z</dc:date>
    </item>
    <item>
      <title>Re: Can we check size of Hive tables? If so - how?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49593#M4978</link>
      <description>I was wondering if stats were needed to have describe extended output the actual file size. I recall something like that.&lt;BR /&gt;&lt;BR /&gt;</description>
      <pubDate>Wed, 18 Jan 2017 06:59:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/49593#M4978</guid>
      <dc:creator>mbigelow</dc:creator>
      <dc:date>2017-01-18T06:59:47Z</dc:date>
    </item>
    <item>
      <title>Re: Can we check size of Hive tables? If so - how?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/62552#M4979</link>
      <description>&lt;P&gt;ANALYZE TABLE db_ip2738.ldl_cohort_with_tests COMPUTE STATISTICS&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;this return nothing in hive. However I ran the hdfs command and got two sizes back. the output looke like this:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;hdfs dfs -du -s -h hdfs://hdpprd/data/prod/users/ip2738/ldl_cohort_with_tests&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;result:&amp;nbsp;&amp;nbsp;2.9 G&amp;nbsp; &amp;nbsp; &amp;nbsp;8.8 G&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;hdfs://hdpprd/data/prod/users/ip2738/ldl_cohort_with_tests&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;which number is the size of the table?&lt;/P&gt;</description>
      <pubDate>Thu, 07 Dec 2017 01:16:35 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/62552#M4979</guid>
      <dc:creator>lotta22</dc:creator>
      <dc:date>2017-12-07T01:16:35Z</dc:date>
    </item>
    <item>
      <title>Re: Can we check size of Hive tables? If so - how?</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/63512#M4980</link>
      <description>&lt;P&gt;Since this is an external table (&lt;SPAN&gt;EXTERNAL_TABLE), Hive will not keep any stats on the table since it is assumed that another application is changing the underlying data at will.&amp;nbsp; Why keep stats if we can't trust that the data will be the same in another 5 minutes?&amp;nbsp; For a managed (non-external) table, data is manipulated through Hive SQL statements (LOAD DATA, INSERT, etc.) so the Hive system will know about any changes to the underlying data and can update the stats accordingly.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Using the HDFS utilities to check the directory file sizes will give you the most accurate answer.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 09 Jan 2018 20:00:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Can-we-check-size-of-Hive-tables-If-so-how/m-p/63512#M4980</guid>
      <dc:creator>David M.</dc:creator>
      <dc:date>2018-01-09T20:00:43Z</dc:date>
    </item>
  </channel>
</rss>

