Created 01-20-2017 10:03 PM
I have hive data with ORC format, and there is a date column on which we are doing partitioned and i would like to move the data to archival directory based on partitioned column name date and if it 30 days older.
Created 01-21-2017 11:54 PM
1. If you want to backup a data in archive location as a table, then create a table as original one
create external table MYTABLE_archive like MYTABLE location '<hdfs_archive_location>';
2. Copy the data from original table to archive one:
2.a. using HQL:
INSERT INTO TABLE MYTABLE_archive partition(date) SELECT * from MYTABLE where date>= <start_date> and date<=<end_date>; -- dynamic partitioning should be enabled;
2.b. using file transfer: Identify location of each partition to be copied. Copy directories to the location of archive table and then add partitions:
hdfs dfs -cp <orig_partitions_loc> <archive_table_location>/ # alternatively you can use distcp in case of huge volumes, but check if you have permissions to execute it. Also make sure you understand the difference between those two command in regards to file attributes.
then add new partitions in archve:
MSCK REPAIR TABLE MYTABLE_archive;
Created 01-21-2017 11:54 PM
1. If you want to backup a data in archive location as a table, then create a table as original one
create external table MYTABLE_archive like MYTABLE location '<hdfs_archive_location>';
2. Copy the data from original table to archive one:
2.a. using HQL:
INSERT INTO TABLE MYTABLE_archive partition(date) SELECT * from MYTABLE where date>= <start_date> and date<=<end_date>; -- dynamic partitioning should be enabled;
2.b. using file transfer: Identify location of each partition to be copied. Copy directories to the location of archive table and then add partitions:
hdfs dfs -cp <orig_partitions_loc> <archive_table_location>/ # alternatively you can use distcp in case of huge volumes, but check if you have permissions to execute it. Also make sure you understand the difference between those two command in regards to file attributes.
then add new partitions in archve:
MSCK REPAIR TABLE MYTABLE_archive;