Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

copy files within hdfs based on the modified time or access time

avatar
Super Collaborator

I have to write a script to move files(csv) from one location in hdfs to another staging location in hdfs.(based on date) As of now I have to move files from April 2nd 2016. Later I have to schedule it so that files will be picked up for every 1 hr and moved to staging location. Hive tables are created on top of this staging location.

1 ACCEPTED SOLUTION

avatar
Rising Star

1) For moving files from 2nd april to another folder in hdfs.

for i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print $8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done

2) Once the above is done you can just setup a crontab.

Please try this scenario out on a test folder in non prod.

View solution in original post

2 REPLIES 2

avatar
Rising Star

1) For moving files from 2nd april to another folder in hdfs.

for i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print $8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done

2) Once the above is done you can just setup a crontab.

Please try this scenario out on a test folder in non prod.

avatar
Expert Contributor