Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Unable to check size of Hive Table

avatar

I've created a table in Hive and loaded data in the same.

Now I need to check the table size of my table.

I'm using the below property to do the same:

show tblproperties yourTableName("rawDataSize")

However, I'm unable to know the table size.

Attaching the output for the same :

horton.png

4 REPLIES 4

avatar

Hi @Prathamesh H!
Instead of using show tblproperties, try to use:

hive> desc formatted salaries;
OK
# col_name            	data_type           	comment             
	 	 
gender              	string              	                    
age                 	int                 	                    
salary              	double              	                    
zip                 	int                 	                    
	 	 
# Detailed Table Information	 	 
Database:           	default             	 
Owner:              	root                	 
CreateTime:         	Wed May 16 15:23:57 UTC 2018	 
LastAccessTime:     	UNKNOWN             	 
Protect Mode:       	None                	 
Retention:          	0                   	 
Location:           	hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries	 
Table Type:         	EXTERNAL_TABLE      	 
Table Parameters:	 	 
	COLUMN_STATS_ACCURATE	{\"BASIC_STATS\":\"true\"}
	EXTERNAL            	TRUE                
	numFiles            	1                   
	numRows             	50                  
	rawDataSize         	732                 
	totalSize           	781                 
	transient_lastDdlTime	1529819960          
	 	 
# Storage Information	 	 
SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
Compressed:         	No                  	 
Num Buckets:        	-1                  	 
Bucket Columns:     	[]                  	 
Sort Columns:       	[]                  	 
Storage Desc Params:	 	 
	field.delim         	,                   
	serialization.format	,                   
Time taken: 0.609 seconds, Fetched: 36 row(s)
hive> exit;
[hive@node3 ~]$ hdfs dfs -ls hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries
Found 1 items
-rwxrwxrwx   3 root root        781 2018-05-16 04:53 hdfs://Admin-TrainingNS/apps/hive/warehouse/salaries/salaries.txt
PS: and it's always a good idea to run
msck repair table <table> --if your table is external
analyze table <table> compute statistics --specially if your table has a lot of inserts

More details on:
https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE

Hope this helps!

avatar

I tried doing as you mentioned. But, still I'm unable to see the raw data size. It's blank. Any other method ?

avatar

@Vinicius Higa Murakami My table is partitioned. I was able to find the size of each partitions by the steps mentioned above. However, i need to find the total size of partitions in one go. Is it feasible ?

avatar

Hi @Prathamesh H!
Sorry about my delay, so regarding your issue. Hmm for partitioned table, afaik, you'll have to summarize per partition, unfortunately 😞
I just heard around, that it's possible to get the size on Hive 2.0. I'm not sure, i didn't test it.
One last thing, the command that i had sent to you, in this case (for partitioned table) would be:

analyze table <table> partition(col1,col2) compute statistics;
Hope this helps!