Is there some way to synchronize local folder with HDFS folder every day?
I used hadoop-fuse-dfs to mount HDFS and then put my files there. Now I want all files in HDFS were up to date, and if user chnages file in local folder it automaticaly chanched in HDFS.
I was trying to use fsync but it doesn't work with Hadoop fs very well.
rsync: rename "/u01/HDFS/mnt/win.121/e38358/.target.json.KTnJyN" -> "win.121/e38358/target.json": Input/output error (5)
Can somebody help me?
Would simply using the FUSE mount as the "local folder" view not work?
Another option would be to run a local DistCp job with -update set. Something akin to hadoop distcp -Dmapred.job.tracker=local -update hdfs://source-remote-path file:///local-destination.
The rsync command may not currently work over FUSE mounts due to https://issues.apache.org/jira/browse/HDFS-4160. However, CDH5 (currently in beta) carries an inbuilt HDFS-NFSv3 proxy that can perhaps be leveraged to run rsyncs instead (or again, used directly as its local view).
This is copying data to namenode host, even if we are running it from any datanode host. Is there a way to update and sync the data on the local host from where we are running the command.