Count star on hive tables is giving stale values and the statistics on table are not getting updated with the right values
If you add files outside of Hive ( not through an insert ) the values will not get updated. That is expected. Anything else might be a jira. So how do you run the insert and what version of hadoop are you on?
I assume you ran the analyze statement again?
Just as a fyi apart from what pbalasundaram wrote you can also disable the use of statistics for queries by setting hive.compute.query.using.stats to false.
But better to enable autogather and to run statistics
I am ingesting data through Hive Insert statements. If I do a select count(some_column), then it shows the right value, but if I do a count(*) it shows stale counts.
Shouldn't the statistics on table be updated automatically after every insert script runs?
Please run analuze statistics on the table to confirm that this changes.
Also check for the value of hive.stats.autogather - to confirm if this is set to true
This will happen if you load data outside of Hive DDL / DML commands (for example through 3rd party ETL tools)
To deal with it, set:
globally. This will cause operations like count(*) to be run as full scans. This does not otherwise interfere with Hive's CBO.