Reply
New Contributor
Posts: 4
Registered: ‎01-13-2017

Can we check size of Hive tables? If so - how?

[ Edited ]

I have many tables in Hive and suspect size of these tables are causing space issues on HDFS FS. Is there a way to check the size of Hive tables? If so - how?

 

hive> show tables;
OK
bee_actions
bee_bills
bee_charges
bee_cpc_notifs
bee_customers
bee_interactions
bee_master_03jun2016_to_17oct2016
bee_master_18may2016_to_02jun2016
bee_master_18oct2016_to_21dec2016
bee_master_20160614_021501
bee_master_20160615_010001
bee_master_20160616_010001
bee_master_20160617_010001
bee_master_20160618_010001
bee_master_20160619_010001
bee_master_20160620_010001
bee_master_20160621_010002
bee_master_20160622_010001
bee_master_20160623_010001
bee_master_20160624_065545
bee_master_20160625_010001
bee_master_20160626_010001
bee_master_20160627_010001
bee_master_20160628_010001
bee_master_20160629_010001
bee_master_20160630_010001
bee_master_20160701_010001
bee_master_20160702_010001
bee_master_20160703_010001
bee_master_20160704_010001
bee_master_20160705_010001
bee_master_20160706_010001
bee_master_20160707_010001
bee_master_20160707_040048
bee_master_20160708_010001
bee_master_20160709_010001
bee_master_20160710_010001
bee_master_20160711_010001
bee_master_20160712_010001
bee_master_20160713_010001
bee_master_20160714_010001
bee_master_20160715_010002
bee_master_20160716_010001
bee_master_20160717_010001
bee_master_20160718_010001
bee_master_20160720_010001
bee_master_20160721_010001
bee_master_20160723_010002
bee_master_20160724_010001
bee_master_20160725_010001
bee_master_20160726_010001
bee_master_20160727_010002
bee_master_20160728_010001
bee_master_20160729_010001
bee_master_20160730_010001
bee_master_20160731_010001
bee_master_20160801_010001
bee_master_20160802_010001
bee_master_20160803_010001

Posts: 642
Topics: 3
Kudos: 105
Solutions: 67
Registered: ‎08-16-2016

Re: Can we check size of Hive tables? If so - how?

describe formatted/extended <table> partition <partition spec>

This will output stats like totalNumberFiles, totalFileSize, maxFileSize, minFileSize, lastAccessTime, and lastUpdateTime.

So not exactly this table is X size. It would seem that if you include the partition it will give you a raw data size.

Otherwise, hdfs dfs -du -s -h /path/to/table will do.

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Describe
New Contributor
Posts: 4
Registered: ‎01-13-2017

Re: Can we check size of Hive tables? If so - how?

Thanks so much for your prompt reply.

I ran the suggested command but i see size as 0 whereas i know it has some data. So what does that mean?

 

hive> describe extended bee_master_20170113_010001
> ;
OK
entity_id string
account_id string
bill_cycle string
entity_type string
col1 string
col2 string
col3 string
col4 string
col5 string
col6 string
col7 string
col8 string
col9 string
col10 string
col11 string
col12 string

Detailed Table Information Table(tableName:bee_master_20170113_010001, dbName:default, owner:sagarpa, createTime:1484297904, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:entity_id, type:string, comment:null), FieldSchema(name:account_id, type:string, comment:null), FieldSchema(name:bill_cycle, type:string, comment:null), FieldSchema(name:entity_type, type:string, comment:null), FieldSchema(name:col1, type:string, comment:null), FieldSchema(name:col2, type:string, comment:null), FieldSchema(name:col3, type:string, comment:null), FieldSchema(name:col4, type:string, comment:null), FieldSchema(name:col5, type:string, comment:null), FieldSchema(name:col6, type:string, comment:null), FieldSchema(name:col7, type:string, comment:null), FieldSchema(name:col8, type:string, comment:null), FieldSchema(name:col9, type:string, comment:null), FieldSchema(name:col10, type:string, comment:null), FieldSchema(name:col11, type:string, comment:null), FieldSchema(name:col12, type:string, comment:null)], location:hdfs://cmilcb521.amdocs.com:8020/user/insighte/bee_data/bee_run_20170113_010001, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{field.delim= , serialization.format=
Time taken: 0.328 seconds, Fetched: 18 row(s)
hive> describe formatted bee_master_20170113_010001
> ;
OK
# col_name data_type comment

entity_id string
account_id string
bill_cycle string
entity_type string
col1 string
col2 string
col3 string
col4 string
col5 string
col6 string
col7 string
col8 string
col9 string
col10 string
col11 string
col12 string

# Detailed Table Information
Database: default
Owner: sagarpa
CreateTime: Fri Jan 13 02:58:24 CST 2017
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://cmilcb521.amdocs.com:8020/user/insighte/bee_data/bee_run_20170113_010001
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE false
EXTERNAL TRUE
numFiles 0
numRows -1
rawDataSize -1
totalSize 0
transient_lastDdlTime 1484297904

# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.081 seconds, Fetched: 48 row(s)
hive> describe formatted bee_ppv;
OK
# col_name data_type comment

entity_id string
account_id string
bill_cycle string
ref_event string
amount double
ppv_category string
ppv_order_status string
ppv_order_date timestamp

# Detailed Table Information
Database: default
Owner: sagarpa
CreateTime: Thu Dec 22 12:56:34 CST 2016
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: hdfs://cmilcb521.amdocs.com:8020/user/insighte/bee_data/tables/bee_ppv
Table Type: EXTERNAL_TABLE
Table Parameters:
COLUMN_STATS_ACCURATE true
EXTERNAL TRUE
numFiles 0
numRows 0
rawDataSize 0
totalSize 0
transient_lastDdlTime 1484340138

# Storage Information
SerDe Library: org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe
InputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.072 seconds, Fetched: 40 row(s)

Posts: 642
Topics: 3
Kudos: 105
Solutions: 67
Registered: ‎08-16-2016

Re: Can we check size of Hive tables? If so - how?

What does hdfs dfs -du -s -h /path/to/table output?
New Contributor
Posts: 4
Registered: ‎01-13-2017

Re: Can we check size of Hive tables? If so - how?

i got the output. Thanks very much for all your help,

Cloudera Employee
Posts: 20
Registered: ‎01-17-2017

Re: Can we check size of Hive tables? If so - how?

448 [GB] hdfs://aewb-analytics-staging-name.example.com:8020/user/hive/warehouse/mybigtable
 
8 [GB]hdfs://aewb-analytics-staging-name.example.com:8020/user/hive/warehouse/anotherone
 
0 [GB]hdfs://aewb-analytics-staging-name.example.com:8020/user/hive/warehouse/tinyone
Cloudera Employee
Posts: 42
Registered: ‎08-16-2016

Re: Can we check size of Hive tables? If so - how?

This command should also help you get the size of HIVE table :

ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan];
Posts: 642
Topics: 3
Kudos: 105
Solutions: 67
Registered: ‎08-16-2016

Re: Can we check size of Hive tables? If so - how?

I was wondering if stats were needed to have describe extended output the actual file size. I recall something like that.

New Contributor
Posts: 1
Registered: ‎12-06-2017

Re: Can we check size of Hive tables? If so - how?

ANALYZE TABLE db_ip2738.ldl_cohort_with_tests COMPUTE STATISTICS 

 

 

this return nothing in hive. However I ran the hdfs command and got two sizes back. the output looke like this:

 

hdfs dfs -du -s -h hdfs://hdpprd/data/prod/users/ip2738/ldl_cohort_with_tests

 

result:  2.9 G     8.8 G       hdfs://hdpprd/data/prod/users/ip2738/ldl_cohort_with_tests

 

which number is the size of the table?

Announcements