- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
copy files within hdfs based on the modified time or access time
- Labels:
-
Apache Hadoop
Created 07-14-2016 10:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have to write a script to move files(csv) from one location in hdfs to another staging location in hdfs.(based on date) As of now I have to move files from April 2nd 2016. Later I have to schedule it so that files will be picked up for every 1 hr and moved to staging location. Hive tables are created on top of this staging location.
Created 07-14-2016 06:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) For moving files from 2nd april to another folder in hdfs.
for i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print $8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done
2) Once the above is done you can just setup a crontab.
Please try this scenario out on a test folder in non prod.
Created 07-14-2016 06:51 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
1) For moving files from 2nd april to another folder in hdfs.
for i in `hdfs dfs -ls /old_data/dataset/|grep "2016-04-02"|awk '{print $8}'`;do echo ${i}; hdfs dfs -mv ${i} /old_data/dataset/TEST/;done
2) Once the above is done you can just setup a crontab.
Please try this scenario out on a test folder in non prod.
Created 07-14-2016 07:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Have you tried the Falcon mirroring feature ? Instead of cluster to cluster replication, you can try replicating to different directories in the same cluster.
