Created 07-14-2016 10:28 AM
I have to write a script to move files(csv) from one location in hdfs to another staging location in hdfs.(based on date) As of now I have to move files from April 2nd 2016. Later I have to schedule it so that files will be picked up for every 1 hr and moved to staging location. Hive tables are created on top of this staging location.
Created 07-14-2016 06:51 PM
1) For moving files from 2nd april to another folder in hdfs.
for i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print $8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done
2) Once the above is done you can just setup a crontab.
Please try this scenario out on a test folder in non prod.
Created 07-14-2016 06:51 PM
1) For moving files from 2nd april to another folder in hdfs.
for i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print $8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done
2) Once the above is done you can just setup a crontab.
Please try this scenario out on a test folder in non prod.
Created 07-14-2016 07:28 PM
Have you tried the Falcon mirroring feature ? Instead of cluster to cluster replication, you can try replicating to different directories in the same cluster.