Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive - Get number of rows, total size resulted in a query

Highlighted

Hive - Get number of rows, total size resulted in a query

New Contributor

I have a query(ex: below) for which I want to get the statistics like how many rows it results, what is the size of a row.

SELECT t1.c1 ,t1.c2 FROM t1 WHERE NOT (t1.c1 IS NULL );

Is there any way to get it?

I tried "explain extended" as below, which is not printing the details that I want.

Explain Extended SELECT t1.c1 ,t1.c2 FROM t1 WHERE NOT (t1.c1 IS NULL );

Any help is appreciated.

5 REPLIES 5
Highlighted

Re: Hive - Get number of rows, total size resulted in a query

@Srikanth Puli

You need to use "describe extended <table_name>;" as below:

describe extended tableex5 ;

| Detailed Table Information | Table(tableName:tableex5, dbName:default, owner:hive, createTime:1487032307, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col1, type:string, comment:null), FieldSchema(name:col2, type:int, comment:null)], location:hdfs://ssnode253stats.openstacklocal:8020/apps/hive/warehouse/tableex5, inputFormat:org.apache.hadoop.mapred.TextInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.contrib.serde2.MultiDelimitSerDe, parameters:{serialization.format=1, field.delim=%|%}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=24, numRows=2, rawDataSize=0, COLUMN_STATS_ACCURATE={"BASIC_STATS":"true"}, numFiles=1, transient_lastDdlTime=1487032937}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) |

Highlighted

Re: Hive - Get number of rows, total size resulted in a query

New Contributor

Thanks for your reply.

I need these details on a query rather than on a table. Hope you got my point.

Highlighted

Re: Hive - Get number of rows, total size resulted in a query

Hi @Srikanth Puli, you cannot get this information before running the your query. Statistics can return only general information about the table, like how many records, total size in bytes, etc. How can anyone know how many non-null c1 fields are there without running the query? And how about non-null c1 and non-null c2 and ... many other conditions? So, please run the query to find out: SELECT COUNT(*), MAX(LENGTH(t1.c1)) FROM t1 WHERE t1.c1 IS NOT NULL; provided that c1 is string.

Highlighted

Re: Hive - Get number of rows, total size resulted in a query

@Srikanth Puli

Please, have a look at this: https://cwiki.apache.org/confluence/display/Hive/StatsDev#StatsDev-ExistingTables–ANALYZE

For a non-partitioned table, you can issue the command:

ANALYZE TABLE Table1 COMPUTE STATISTICS FOR COLUMNS;

to gather column statistics of the table (Hive 0.10.0 and later).

If Table1 is a partitioned table, then for basic statistics you have to specify partition specifications like above in the analyze statement. Otherwise a semantic analyzer exception will be thrown.

However for column statistics, if no partition specification is given in the analyze statement, statistics for all partitions are computed.

You can view the stored statistics by issuing the DESCRIBE command. Statistics are stored in the Parameters array. Suppose you issue the analyze command for the whole table Table1, then issue the command:

DESCRIBE EXTENDED TABLE1;

then among the output, the following would be displayed:

... , parameters:{numPartitions=4, numFiles=16, numRows=2000, totalSize=16384, ...}, ....

I hope this helps.

Highlighted

Re: Hive - Get number of rows, total size resulted in a query

Contributor

For getting the number of rows in result set you could use this query

Select count(*)

from

(SELECT t1.c1 ,t1.c2 FROM t1 WHERE NOT (t1.c1 IS NULL )) AS table_1;

Don't have an account?
Coming from Hortonworks? Activate your account here