Created 04-27-2018 08:35 AM
Hi ,
I have created a partitioned table in source cluster and loaded data into the table in the hdfs location , with the help of distcp I am moving this file (hdfs ) to destination cluster and trying to fetch the record in the destination cluster but I am not able to fetch nay record it shows me 0 rows . however other than partitioned table when I do the same I am able to fetch the information .
Please find commands for the same :
source cluster :
create external table parti( id int, name string ) partitioned by (dept string) row format delimited fields terminated by ',' location '/part1'
load data inpath local '/tmp/hv1.txt' into table parti partition(dept='sat')
load data inpath local '/tmp/hv2.txt' into table parti partition(dept='jbp')
select *from parti where dept='sat'
able to get the records
Destination cluster :
create external table parti( id int, name string ) partitioned by (dept string) row format delimited fields terminated by ',' location '/part2'
distcp /part1 (source cluster ) to / (destination cluster )
hdfs dfs -mv /part2 /part2_old
hdfs dfs -mv /part1 /part2
now in destination cluster I am trying to fetch record :
select *from parti where dept='sat' ;
no records
Note : This issue I am facing only for partitioned table other tables like external /managed table non partitioned I do not face such issue while doing the same activity renaming folder .
Created 04-27-2018 11:03 PM
Repairing the table might help by connecting the HDFS data partitions to the meta store.
Try this,
MSCK REPAIR TABLE parti;
Created 04-28-2018 07:00 PM
When you use the following commands to push the data into your Hive table in new cluster
hdfs dfs -mv /part2 /part2_old hdfs dfs -mv /part1 /part2
Your Hive engine and metastore don't get any intimation that a new partition has been added. It's simply a data copy/move operation on HDFS and Hive don't have any idea about it.
A show partitions operation on your table in new cluster won't show anything.
show partitions <your table name>; //Should not return anything
You can tell your Hive engine to look into the HDFS and identify if some new data has been added outside of Hive by using the following command.
msck repair table <your table name>
Now if you do a show partitions, you shall be able to see the partitions that you just "created" by using the HDFS commands. Also, a select operation should work fine.
If this reply helps you understand and fix your issue, please mark it as Accepted so that other community users can benefit from it.