- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Restore partitions in another Hive or Impala after load data into another HDFS
- Labels:
-
Apache Hive
-
Apache Impala
Created on ‎11-12-2016 08:52 PM - edited ‎09-16-2022 03:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi all,
I am using Impala, but Hive should have the same problem as Hive is more general so I put the questions in the Hive forum.
I want to create partitions in another Impala table after I transfer the HDFS data from one HDFS to another, I used some internal file system transfer API, the HDFS is using some comercial storage, not like the original compute/data node disk based HDFS.
So, after the file transfer, I rerun the show create table result from the source table, the table is created in target.
And I need to update table add partitions for original tables to make target table recognize the partitions in the HDFS.
To automate that, it involves read the partitions from source table and use some ways to restore the partitions in the target.
Is there any easy way to do that?
Thanks!
Regards,
Wenbin
Created ‎11-21-2016 12:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Wenbin,
I hope I understood well your use case. So you say that the data files are transferred to the correct HDFS location (with proper partitioning format directories, like partitionname=partitionvalue) but you want to make aware the Hive that there is a new partition on the HDFS.
In this case you need the
MSCK REPAIR TABLE table_name
command, please see:
In this case you don't need to execute ALTER TABLE ADD PARTITION for each new partition, Hive will recognize it.
In the newer Impala versions the same functionality exists in Impala as command:
ALTER TABLE table_name RECOVER PARTITIONS
Regards
Miklos Szurap
Customer Operations Engineer
Created ‎11-21-2016 12:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Wenbin,
I hope I understood well your use case. So you say that the data files are transferred to the correct HDFS location (with proper partitioning format directories, like partitionname=partitionvalue) but you want to make aware the Hive that there is a new partition on the HDFS.
In this case you need the
MSCK REPAIR TABLE table_name
command, please see:
In this case you don't need to execute ALTER TABLE ADD PARTITION for each new partition, Hive will recognize it.
In the newer Impala versions the same functionality exists in Impala as command:
ALTER TABLE table_name RECOVER PARTITIONS
Regards
Miklos Szurap
Customer Operations Engineer
