Created 06-27-2018 11:00 AM
I've created a table in Hive and loaded data in the same.
Now I need to check the table size of my table.
I'm using the below property to do the same:
show tblproperties yourTableName("rawDataSize")
However, I'm unable to know the table size.
Attaching the output for the same :
Created 06-27-2018 11:33 PM
Hi @Prathamesh H!
Instead of using show tblproperties, try to use:
hive> desc formatted salaries; OK # col_name data_type comment gender string age int salary double zip int # Detailed Table Information Database: default Owner: root CreateTime: Wed May 16 15:23:57 UTC 2018 LastAccessTime: UNKNOWN Protect Mode: None Retention: 0 Location: hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries Table Type: EXTERNAL_TABLE Table Parameters: COLUMN_STATS_ACCURATE {\"BASIC_STATS\":\"true\"} EXTERNAL TRUE numFiles 1 numRows 50 rawDataSize 732 totalSize 781 transient_lastDdlTime 1529819960 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim , serialization.format , Time taken: 0.609 seconds, Fetched: 36 row(s) hive> exit; [hive@node3 ~]$ hdfs dfs -ls hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries Found 1 items -rwxrwxrwx 3 root root 781 2018-05-16 04:53 hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries/salaries.txtPS: and it's always a good idea to run
msck repair table <table> --if your table is external
analyze table <table> compute statistics --specially if your table has a lot of inserts
More details on:
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE
Hope this helps!
Created 06-28-2018 01:47 PM
I tried doing as you mentioned. But, still I'm unable to see the raw data size. It's blank. Any other method ?
Created 06-28-2018 02:13 PM
@Vinicius Higa Murakami My table is partitioned. I was able to find the size of each partitions by the steps mentioned above. However, i need to find the total size of partitions in one go. Is it feasible ?
Created 06-28-2018 07:29 PM
Hi @Prathamesh H!
Sorry about my delay, so regarding your issue. Hmm for partitioned table, afaik, you'll have to summarize per partition, unfortunately 😞
I just heard around, that it's possible to get the size on Hive 2.0. I'm not sure, i didn't test it.
One last thing, the command that i had sent to you, in this case (for partitioned table) would be:
analyze table <table> partition(col1,col2) compute statistics;Hope this helps!