Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

hive partition table issue while copying into other cluster

avatar

Hi ,

I have created a partitioned table in source cluster and loaded data into the table in the hdfs location , with the help of distcp I am moving this file (hdfs ) to destination cluster and trying to fetch the record in the destination cluster but I am not able to fetch nay record it shows me 0 rows . however other than partitioned table when I do the same I am able to fetch the information .

Please find commands for the same :

source cluster :

create external table parti( id int, name string ) partitioned by (dept string) row format delimited fields terminated by ',' location '/part1'

load data inpath local '/tmp/hv1.txt' into table parti partition(dept='sat')

load data inpath local '/tmp/hv2.txt' into table parti partition(dept='jbp')

select *from parti where dept='sat'

able to get the records

Destination cluster :

create external table parti( id int, name string ) partitioned by (dept string) row format delimited fields terminated by ',' location '/part2'

distcp /part1 (source cluster ) to / (destination cluster )

hdfs dfs -mv /part2 /part2_old

hdfs dfs -mv /part1 /part2

now in destination cluster I am trying to fetch record :

select *from parti where dept='sat' ;

no records

Note : This issue I am facing only for partitioned table other tables like external /managed table non partitioned I do not face such issue while doing the same activity renaming folder .

2 REPLIES 2

avatar
Contributor

Repairing the table might help by connecting the HDFS data partitions to the meta store.

Try this,

MSCK REPAIR TABLE parti;

avatar
@Anurag Mishra

When you use the following commands to push the data into your Hive table in new cluster

hdfs dfs -mv /part2 /part2_old
hdfs dfs -mv /part1 /part2

Your Hive engine and metastore don't get any intimation that a new partition has been added. It's simply a data copy/move operation on HDFS and Hive don't have any idea about it.

A show partitions operation on your table in new cluster won't show anything.

show partitions <your table name>;    //Should not return anything

You can tell your Hive engine to look into the HDFS and identify if some new data has been added outside of Hive by using the following command.

msck repair table <your table name>

Now if you do a show partitions, you shall be able to see the partitions that you just "created" by using the HDFS commands. Also, a select operation should work fine.

If this reply helps you understand and fix your issue, please mark it as Accepted so that other community users can benefit from it.