- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Count query is returning different values on two same tables
- Labels:
-
Apache Hive
Created 03-11-2021 12:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi All,
I have migrated a hive table from one cluster to another cluster. I have exported table to hdfs first, then moved it to another cluster and imported to hive. While doing that i have used below commands :
hive -e "EXPORT TABLE <dbname>.<tablename> TO 'hdfs://<cluster-1-active-namenode-IP>:8020/tmp/export_<tablename>/';" > <tablename>.log
hadoop distcp hdfs://<cluster-1-active-namenode-IP>:8020/tmp/export_<tablename>/ hdfs://<cluster-2-active-namenode-IP>:8020/tmp/export_<tablename>/ > export_<tablename>.log
IMPORT TABLE <dbname>.<tablename> FROM 'hdfs://<cluster-2-active-namenode-IP>:8020/tmp/export_<tablename>/';
After moving this table to hive in new cluster, i have created a new table and inserted all table data to a new table like below :
insert into table <new_tablename> select * from <tablename>;
For example, below command returns approximately 234 millions as row count :
select count(*) from <tablename>;
However below commands only 42 millions as row count :
select count(*) from <new_tablename>;
What could be the cause of this problem?
Created 03-11-2021 05:46 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
The tables where you count in your example, are both on the new cluster, correct?
Are the tables ACID?
What version of hive are you using?
Created on 03-11-2021 10:50 PM - edited 03-11-2021 10:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
Both tables are on the new cluster and they are stored as ORC format, so they are acid tables and support insert/update/delete operations. I am using Hive 3.1.0 .
Created 03-11-2021 11:46 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
can you try to force hive compaction?
ALTER TABLE <db_name>.<table_name> PARTITION (<partition_name>='<partition_value>') COMPACT 'major';
with the show compactions command it checks when it has finished and the result (failed or ok).
After this it tries to execute the count again
Created 03-12-2021 12:24 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
COuld you please collect STATS and check,if it returns in corect counts
