Support Questions

Find answers, ask questions, and share your expertise

Count query is returning different values on two same tables

avatar
New Contributor

Hi All,

 

I have migrated a hive table from one cluster to another cluster. I have exported table to hdfs first, then moved it to another cluster and imported to hive. While doing that i have used below commands :

 

hive -e "EXPORT TABLE <dbname>.<tablename> TO 'hdfs://<cluster-1-active-namenode-IP>:8020/tmp/export_<tablename>/';" > <tablename>.log

hadoop distcp hdfs://<cluster-1-active-namenode-IP>:8020/tmp/export_<tablename>/    hdfs://<cluster-2-active-namenode-IP>:8020/tmp/export_<tablename>/ > export_<tablename>.log

IMPORT TABLE <dbname>.<tablename>     FROM    'hdfs://<cluster-2-active-namenode-IP>:8020/tmp/export_<tablename>/';

 

After moving this table to hive in new cluster, i have created a new table and inserted all table data to a new table like below :

 

insert into table <new_tablename> select * from <tablename>;

 

For example, below command returns approximately 234 millions as row count :

 

select count(*) from <tablename>;

 

However below commands only 42 millions as row count :

 

select count(*) from <new_tablename>;

 

 

What could be the cause of this problem?

4 REPLIES 4

avatar
Expert Contributor

Hi,

The tables where you count in your example, are both on the new cluster, correct?

 

Are the tables ACID?


What version of hive are you using?

 

avatar
New Contributor

Hi,

 

Both tables are on the new cluster and they are stored  as ORC format, so they are acid tables and support insert/update/delete operations. I am using Hive 3.1.0 .

avatar
Expert Contributor

Hi,

 

can you try to force hive compaction?

ALTER TABLE <db_name>.<table_name> PARTITION (<partition_name>='<partition_value>') COMPACT 'major';

with the show compactions command it checks when it has finished and the result (failed or ok).

 

After this it tries to execute the count again

 

avatar
Guru

COuld you please collect STATS and check,if it returns in corect counts

 

https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/hive-analyzing-ta...