Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

Synchronize local folder and folder in HDFS

Synchronize local folder and folder in HDFS

Contributor

Hi all,

 

Is there some way to synchronize local folder with HDFS folder every day?

I used hadoop-fuse-dfs to mount HDFS and then put my files there. Now I want all files in HDFS were up to date, and if user chnages file in local folder it automaticaly chanched in HDFS.

 

I was trying to use fsync but it doesn't work with Hadoop fs very well. 

 

rsync: rename "/u01/HDFS/mnt/win.121/e38358/.target.json.KTnJyN" -> "win.121/e38358/target.json": Input/output error (5)

 

Can somebody help me?

 

Regards,

Markovich

3 REPLIES 3

Re: Synchronize local folder and folder in HDFS

Master Guru

Would simply using the FUSE mount as the "local folder" view not work?

 

Another option would be to run a local DistCp job with -update set. Something akin to hadoop distcp -Dmapred.job.tracker=local -update hdfs://source-remote-path file:///local-destination.

 

The rsync command may not currently work over FUSE mounts due to https://issues.apache.org/jira/browse/HDFS-4160. However, CDH5 (currently in beta) carries an inbuilt HDFS-NFSv3 proxy that can perhaps be leveraged to run rsyncs instead (or again, used directly as its local view).

Re: Synchronize local folder and folder in HDFS

Explorer
Hi, any similar solution for YARN? I guess this one works for MRv1 only.
Highlighted

Re: Synchronize local folder and folder in HDFS

Explorer

This is copying data to namenode host, even if we are running it from any datanode host. Is there a way to update and sync the data on the local host from where we are running the command.