Support Questions

Find answers, ask questions, and share your expertise

Restore partitions in another Hive or Impala after load data into another HDFS

avatar
Explorer

Hi all,

 

I am using Impala, but Hive should have the same problem as Hive is more general so I put the questions in the Hive forum.

 

I want to create partitions in another Impala table after I transfer the HDFS data from one HDFS to another, I used some internal file system transfer API, the HDFS is using some comercial storage, not like the original compute/data node disk based HDFS.

 

So, after the file transfer, I rerun the show create table result from the source table, the table is created in target.

And I need to update table add partitions for original tables to make target table recognize the partitions in the HDFS.

 

To automate that, it involves read the partitions from source table and use some ways to restore the partitions in the target.

 

Is there any easy way to do that?

 

Thanks!

 

Regards,

Wenbin

1 ACCEPTED SOLUTION

avatar

Hi Wenbin,

 

I hope I understood well your use case. So you say that the data files are transferred to the correct HDFS location (with proper partitioning format directories, like partitionname=partitionvalue) but you want to make aware the Hive that there is a new partition on the HDFS.

In this case you need the

 

MSCK REPAIR TABLE table_name

 

command, please see:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartiti...)

In this case you don't need to execute ALTER TABLE ADD PARTITION for each new partition, Hive will recognize it.

 

In the newer Impala versions the same functionality exists in Impala as command:

ALTER TABLE table_name RECOVER PARTITIONS

 

Regards

 Miklos Szurap

Customer Operations Engineer

View solution in original post

1 REPLY 1

avatar

Hi Wenbin,

 

I hope I understood well your use case. So you say that the data files are transferred to the correct HDFS location (with proper partitioning format directories, like partitionname=partitionvalue) but you want to make aware the Hive that there is a new partition on the HDFS.

In this case you need the

 

MSCK REPAIR TABLE table_name

 

command, please see:

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-RecoverPartiti...)

In this case you don't need to execute ALTER TABLE ADD PARTITION for each new partition, Hive will recognize it.

 

In the newer Impala versions the same functionality exists in Impala as command:

ALTER TABLE table_name RECOVER PARTITIONS

 

Regards

 Miklos Szurap

Customer Operations Engineer