Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

copy files within hdfs based on the modified time or access time

avatar
Super Collaborator

I have to write a script to move files(csv) from one location in hdfs to another staging location in hdfs.(based on date) As of now I have to move files from April 2nd 2016. Later I have to schedule it so that files will be picked up for every 1 hr and moved to staging location. Hive tables are created on top of this staging location.

1 ACCEPTED SOLUTION

avatar
Rising Star

1) For moving files from 2nd april to another folder in hdfs.

for i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print $8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done

2) Once the above is done you can just setup a crontab.

Please try this scenario out on a test folder in non prod.

View solution in original post

2 REPLIES 2

avatar
Rising Star

1) For moving files from 2nd april to another folder in hdfs.

for i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print $8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done

2) Once the above is done you can just setup a crontab.

Please try this scenario out on a test folder in non prod.

avatar
Expert Contributor