Support Questions

Find answers, ask questions, and share your expertise

copy files within hdfs based on the modified time or access time

avatar
Super Collaborator

I have to write a script to move files(csv) from one location in hdfs to another staging location in hdfs.(based on date) As of now I have to move files from April 2nd 2016. Later I have to schedule it so that files will be picked up for every 1 hr and moved to staging location. Hive tables are created on top of this staging location.

1 ACCEPTED SOLUTION

avatar
Rising Star

1) For moving files from 2nd april to another folder in hdfs.

for i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print $8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done

2) Once the above is done you can just setup a crontab.

Please try this scenario out on a test folder in non prod.

View solution in original post

2 REPLIES 2

avatar
Rising Star

1) For moving files from 2nd april to another folder in hdfs.

for i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print $8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done

2) Once the above is done you can just setup a crontab.

Please try this scenario out on a test folder in non prod.

avatar
Expert Contributor