Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive - Get number of rows, total size resulted in a query

avatar
New Contributor

I have a query(ex: below) for which I want to get the statistics like how many rows it results, what is the size of a row.

SELECT t1.c1 ,t1.c2 FROM t1 WHERE NOT (t1.c1 IS NULL );

Is there any way to get it?

I tried "explain extended" as below, which is not printing the details that I want.

Explain Extended SELECT t1.c1 ,t1.c2 FROM t1 WHERE NOT (t1.c1 IS NULL );

Any help is appreciated.

5 REPLIES 5

avatar
@Srikanth Puli

You need to use "describe extended <table_name>;" as below:

describe extended tableex5 ;

| Detailed Table Information | Table(tableName:tableex5, dbName:default, owner:hive, createTime:1487032307, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:null), FieldSchema(name:col2, type:int, comment:null)], location:hdfs://ssnode253stats.openstacklocal:8020/apps/hive/warehouse/tableex5, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe, parameters:{serialization.format=1, field.delim=%|%}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=24, numRows=2, rawDataSize=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=1, transient_lastDdlTime=1487032937}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) |

avatar
New Contributor

Thanks for your reply.

I need these details on a query rather than on a table. Hope you got my point.

avatar
Master Guru

Hi @Srikanth Puli, you cannot get this information before running the your query. Statistics can return only general information about the table, like how many records, total size in bytes, etc. How can anyone know how many non-null c1 fields are there without running the query? And how about non-null c1 and non-null c2 and ... many other conditions? So, please run the query to find out: SELECT COUNT(*), MAX(LENGTH(t1.c1)) FROM t1 WHERE t1.c1 IS NOT NULL; provided that c1 is string.

avatar
@Srikanth Puli

Please, have a look at this: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE

For a non-partitioned table, you can issue the command:

ANALYZE TABLE Table1 COMPUTE STATISTICS FOR COLUMNS;

to gather column statistics of the table (Hive 0.10.0 and later).

If Table1 is a partitioned table, then for basic statistics you have to specify partition specifications like above in the analyze statement. Otherwise a semantic analyzer exception will be thrown.

However for column statistics, if no partition specification is given in the analyze statement, statistics for all partitions are computed.

You can view the stored statistics by issuing the DESCRIBE command. Statistics are stored in the Parameters array. Suppose you issue the analyze command for the whole table Table1, then issue the command:

DESCRIBE EXTENDED TABLE1;

then among the output, the following would be displayed:

... , parameters:{numPartitions=4, numFiles=16, numRows=2000, totalSize=16384, ...}, ....

I hope this helps.

avatar
Rising Star

For getting the number of rows in result set you could use this query

Select count(*)

from

(SELECT t1.c1 ,t1.c2 FROM t1 WHERE NOT (t1.c1 IS NULL )) AS table_1;