Created 05-11-2017 02:55 PM
Hi, is it possible to find out the number of rows in a table from the hive metastore?
I don't want to run "select count(*) from <table>" from the hive prompt. Rather I am looking to find that count from the hive metadata.
Appreciate any insights.
Created 05-11-2017 03:59 PM
Hi @n c
Hcatalog holds metadata of a table details like Schema, Index, Roles, Structure, Bucketing, Partitions keys, Columns, privileges, When the table was created, By whom it was created etc. But It doesn't contain any details about how many records are stored in each table.
Created 05-12-2017 08:35 AM
Alternatively analyze table tbl_name compute statistics; will give you no of files, no of records and its size.
Hope it helps.
Created 05-11-2017 11:37 PM
If table statistics are updated you can run DESC FORMATTED <table_name>. This will display row count.
Created 09-29-2017 01:45 PM
As of the HDP 2.6.1 release here is a query that I use to find row counts on a specific partitioned table:
SELECT * FROM hive.PARTITION_PARAMS AS A, hive.PARTITIONS AS B WHERE A.PARAM_KEY='numRows' and A.PART_ID=B.PART_ID and A.PART_ID IN ( SELECT PART_ID FROM hive.PARTITIONS WHERE TBL_ID=(SELECT A.TBL_ID FROM hive.TBLS AS A, hive.DBS AS B WHERE A.DB_ID=B.DB_ID AND B.NAME='DATABASE_NAME' AND A.TBL_NAME='TABLE_NAME'))
This specific query gives me row counts for each partition of a specific table. You need to replace DATABASE_NAME and TABLE_NAME with the one you are interested in.
MySQL metastore but should work with others.
Created 11-08-2017 02:20 PM
It works after I set this in hive-site.xml
I am not sure the reason, but it was forced to run mapreduce
hive.compute.query.using.stats=false