Created 03-11-2021 12:53 AM
Hi All,
I have migrated a hive table from one cluster to another cluster. I have exported table to hdfs first, then moved it to another cluster and imported to hive. While doing that i have used below commands :
hive -e "EXPORT TABLE <dbname>.<tablename> TO 'hdfs://<cluster-1-active-namenode-IP>:8020/tmp/export_<tablename>/';" > <tablename>.log
hadoop distcp hdfs://<cluster-1-active-namenode-IP>:8020/tmp/export_<tablename>/ hdfs://<cluster-2-active-namenode-IP>:8020/tmp/export_<tablename>/ > export_<tablename>.log
IMPORT TABLE <dbname>.<tablename> FROM 'hdfs://<cluster-2-active-namenode-IP>:8020/tmp/export_<tablename>/';
After moving this table to hive in new cluster, i have created a new table and inserted all table data to a new table like below :
insert into table <new_tablename> select * from <tablename>;
For example, below command returns approximately 234 millions as row count :
select count(*) from <tablename>;
However below commands only 42 millions as row count :
select count(*) from <new_tablename>;
What could be the cause of this problem?
Created 03-11-2021 05:46 AM
Hi,
The tables where you count in your example, are both on the new cluster, correct?
Are the tables ACID?
What version of hive are you using?
Created on 03-11-2021 10:50 PM - edited 03-11-2021 10:51 PM
Hi,
Both tables are on the new cluster and they are stored as ORC format, so they are acid tables and support insert/update/delete operations. I am using Hive 3.1.0 .
Created 03-11-2021 11:46 PM
Hi,
can you try to force hive compaction?
ALTER TABLE <db_name>.<table_name> PARTITION (<partition_name>='<partition_value>') COMPACT 'major';
with the show compactions command it checks when it has finished and the result (failed or ok).
After this it tries to execute the count again
Created 03-12-2021 12:24 AM
COuld you please collect STATS and check,if it returns in corect counts