Reply
Contributor
Posts: 25
Registered: ‎10-11-2013

Synchronize local folder and folder in HDFS

Hi all,

 

Is there some way to synchronize local folder with HDFS folder every day?

I used hadoop-fuse-dfs to mount HDFS and then put my files there. Now I want all files in HDFS were up to date, and if user chnages file in local folder it automaticaly chanched in HDFS.

 

I was trying to use fsync but it doesn't work with Hadoop fs very well. 

 

rsync: rename "/u01/HDFS/mnt/win.121/e38358/.target.json.KTnJyN" -> "win.121/e38358/target.json": Input/output error (5)

 

Can somebody help me?

 

Regards,

Markovich

Posts: 1,730
Kudos: 357
Solutions: 274
Registered: ‎07-31-2013

Re: Synchronize local folder and folder in HDFS

[ Edited ]

Would simply using the FUSE mount as the "local folder" view not work?

 

Another option would be to run a local DistCp job with -update set. Something akin to hadoop distcp -Dmapred.job.tracker=local -update hdfs://source-remote-path file:///local-destination.

 

The rsync command may not currently work over FUSE mounts due to https://issues.apache.org/jira/browse/HDFS-4160. However, CDH5 (currently in beta) carries an inbuilt HDFS-NFSv3 proxy that can perhaps be leveraged to run rsyncs instead (or again, used directly as its local view).

Explorer
Posts: 18
Registered: ‎05-02-2014

Re: Synchronize local folder and folder in HDFS

Hi, any similar solution for YARN? I guess this one works for MRv1 only.
New Contributor
Posts: 5
Registered: ‎10-21-2017

Re: Synchronize local folder and folder in HDFS

This is copying data to namenode host, even if we are running it from any datanode host. Is there a way to update and sync the data on the local host from where we are running the command.

Announcements