Support Questions

tpzenes · ‎03-11-2021

Hi All,

I have migrated a hive table from one cluster to another cluster. I have exported table to hdfs first, then moved it to another cluster and imported to hive. While doing that i have used below commands :

hive -e "EXPORT TABLE <dbname>.<tablename> TO 'hdfs://<cluster-1-active-namenode-IP>:8020/tmp/export_<tablename>/';" > <tablename>.log

hadoop distcp hdfs://<cluster-1-active-namenode-IP>:8020/tmp/export_<tablename>/    hdfs://<cluster-2-active-namenode-IP>:8020/tmp/export_<tablename>/ > export_<tablename>.log

IMPORT TABLE <dbname>.<tablename>     FROM    'hdfs://<cluster-2-active-namenode-IP>:8020/tmp/export_<tablename>/';

After moving this table to hive in new cluster, i have created a new table and inserted all table data to a new table like below :

insert into table <new_tablename> select * from <tablename>;

For example, below command returns approximately 234 millions as row count :

select count(*) from <tablename>;

However below commands only 42 millions as row count :

select count(*) from <new_tablename>;

What could be the cause of this problem?

isoardi · ‎03-11-2021

Hi,

The tables where you count in your example, are both on the new cluster, correct?

Are the tables ACID?

What version of hive are you using?

tpzenes · ‎03-11-2021

Hi,

Both tables are on the new cluster and they are stored as ORC format, so they are acid tables and support insert/update/delete operations. I am using Hive 3.1.0 .

isoardi · ‎03-11-2021

Hi,

can you try to force hive compaction?

ALTER TABLE <db_name>.<table_name> PARTITION (<partition_name>='<partition_value>') COMPACT 'major';

with the show compactions command it checks when it has finished and the result (failed or ok).

After this it tries to execute the count again

asish · ‎03-12-2021

COuld you please collect STATS and check,if it returns in corect counts

https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_cloud-data-access/content/hive-analyzing-ta...

Cloudera Community

Support Questions

Count query is returning different values on two same tables

Hive query to check mathematical values from multi...

Jolt Transformation returning null values

How to connect to CDW (Impala) to return actively ...

UNIX_TIMESTAMP function returns different values w...

Optimizing Hive queries for ORC formatted tables

How to find the count of NULL values in Hive

Pyspark: Table Dataframe returning empty records f...

Impala Query on Hive transactional table returns 0...

Query To Return Result Only If Data Exists in Mult...

Count total number of hive tables across all datab...