10-29-2013 05:54 AM
Is there some way to synchronize local folder with HDFS folder every day?
I used hadoop-fuse-dfs to mount HDFS and then put my files there. Now I want all files in HDFS were up to date, and if user chnages file in local folder it automaticaly chanched in HDFS.
I was trying to use fsync but it doesn't work with Hadoop fs very well.
rsync: rename "/u01/HDFS/mnt/win.121/e38358/.target.json.KTnJyN" -> "win.121/e38358/target.json": Input/output error (5)
Can somebody help me?
12-28-2013 09:10 PM - edited 12-28-2013 09:11 PM
Would simply using the FUSE mount as the "local folder" view not work?
Another option would be to run a local DistCp job with -update set. Something akin to hadoop distcp -Dmapred.job.tracker=local -update hdfs://source-remote-path file:///local-destination.
The rsync command may not currently work over FUSE mounts due to https://issues.apache.org/jira/browse/HDFS-4160. However, CDH5 (currently in beta) carries an inbuilt HDFS-NFSv3 proxy that can perhaps be leveraged to run rsyncs instead (or again, used directly as its local view).
01-04-2018 04:21 PM
This is copying data to namenode host, even if we are running it from any datanode host. Is there a way to update and sync the data on the local host from where we are running the command.