Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

I would like a move hive data into a archival directory(certain location) based on date range.

Rising Star

I have hive data with ORC format, and there is a date column on which we are doing partitioned and i would like to move the data to archival directory based on partitioned column name date and if it 30 days older.

1 ACCEPTED SOLUTION

Expert Contributor

Hi @Praveen PentaReddy,

1. If you want to backup a data in archive location as a table, then create a table as original one

create external table MYTABLE_archive like MYTABLE location '<hdfs_archive_location>';

2. Copy the data from original table to archive one:

2.a. using HQL:

INSERT INTO TABLE MYTABLE_archive partition(date) SELECT * from MYTABLE where date>= <start_date> and date<=<end_date>;
-- dynamic partitioning should be enabled;

2.b. using file transfer: Identify location of each partition to be copied. Copy directories to the location of archive table and then add partitions:

hdfs dfs -cp <orig_partitions_loc> <archive_table_location>/
# alternatively you can use distcp in case of huge volumes, but check if you have permissions to execute it. Also make sure you understand the difference between those two command in regards to file attributes.

then add new partitions in archve:

MSCK REPAIR TABLE MYTABLE_archive;

View solution in original post

1 REPLY 1

Expert Contributor

Hi @Praveen PentaReddy,

1. If you want to backup a data in archive location as a table, then create a table as original one

create external table MYTABLE_archive like MYTABLE location '<hdfs_archive_location>';

2. Copy the data from original table to archive one:

2.a. using HQL:

INSERT INTO TABLE MYTABLE_archive partition(date) SELECT * from MYTABLE where date>= <start_date> and date<=<end_date>;
-- dynamic partitioning should be enabled;

2.b. using file transfer: Identify location of each partition to be copied. Copy directories to the location of archive table and then add partitions:

hdfs dfs -cp <orig_partitions_loc> <archive_table_location>/
# alternatively you can use distcp in case of huge volumes, but check if you have permissions to execute it. Also make sure you understand the difference between those two command in regards to file attributes.

then add new partitions in archve:

MSCK REPAIR TABLE MYTABLE_archive;