Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

I would like a move hive data into a archival directory(certain location) based on date range.

Solved Go to solution

I would like a move hive data into a archival directory(certain location) based on date range.

Rising Star

I have hive data with ORC format, and there is a date column on which we are doing partitioned and i would like to move the data to archival directory based on partitioned column name date and if it 30 days older.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: I would like a move hive data into a archival directory(certain location) based on date range.

Expert Contributor

Hi @Praveen PentaReddy,

1. If you want to backup a data in archive location as a table, then create a table as original one

create external table MYTABLE_archive like MYTABLE location '<hdfs_archive_location>';

2. Copy the data from original table to archive one:

2.a. using HQL:

INSERT INTO TABLE MYTABLE_archive partition(date) SELECT * from MYTABLE where date>= <start_date> and date<=<end_date>;
-- dynamic partitioning should be enabled;

2.b. using file transfer: Identify location of each partition to be copied. Copy directories to the location of archive table and then add partitions:

hdfs dfs -cp <orig_partitions_loc> <archive_table_location>/
# alternatively you can use distcp in case of huge volumes, but check if you have permissions to execute it. Also make sure you understand the difference between those two command in regards to file attributes.

then add new partitions in archve:

MSCK REPAIR TABLE MYTABLE_archive;
1 REPLY 1

Re: I would like a move hive data into a archival directory(certain location) based on date range.

Expert Contributor

Hi @Praveen PentaReddy,

1. If you want to backup a data in archive location as a table, then create a table as original one

create external table MYTABLE_archive like MYTABLE location '<hdfs_archive_location>';

2. Copy the data from original table to archive one:

2.a. using HQL:

INSERT INTO TABLE MYTABLE_archive partition(date) SELECT * from MYTABLE where date>= <start_date> and date<=<end_date>;
-- dynamic partitioning should be enabled;

2.b. using file transfer: Identify location of each partition to be copied. Copy directories to the location of archive table and then add partitions:

hdfs dfs -cp <orig_partitions_loc> <archive_table_location>/
# alternatively you can use distcp in case of huge volumes, but check if you have permissions to execute it. Also make sure you understand the difference between those two command in regards to file attributes.

then add new partitions in archve:

MSCK REPAIR TABLE MYTABLE_archive;