In Hive, instead of running 'select count(*) from <table>', can I run on the hive metadata (in MySQL) :
use hive; select d.NAME, t.TBL_NAME, tp.PARAM_VALUE from TBLS t, TABLE_PARAMS tp, DBS d where t.TBL_ID = tp.TBL_ID and PARAM_KEY = 'numRows' and d.DB_ID = t.DB_ID;
I was thinking that if the above works, that might be better than running select count(*)... which will result in a mapreduce job.
Appreciate the insights.
You can fetch the stats from within hive. See the following thread:
You can use the following command:
-- In Hive1 for unpartitioned table SHOW tblproperties <tablename>; -- In Hive1 for partitioned table DESC formatted <tablename> partition (<partitionspec>); -- If using Hive2 on HDP-2.6, following will work on both partitioned and unpartitioned table DESC formatted <tablename>;
Hive will by default return the count based on table statistics when executing
SELECT COUNT(*) FROM tblName;
It will not generate a tez/MR task unless the table does not have statistics. If you are using an external table and are getting an incorrect count you can run
ANALYZE TABLE tblName COMPUTE STATISTICS;
to update the metadata.