Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Move file from one HDFS directoy to another using scala/java

avatar

I've files in one hdfs folder and after checking few things i wanted to move that file to another directory on hdfs.

Currently i am using filesystem object with rename it is doing the job but it is actually renaming the file with complete path.

Do have any other way to do it?

Appriciate your help.

Thanks,

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@RAUI

The answer is no. Renaming is the way to move files on HDFS: FileSystem.rename(). Actually, this is exactly what the HDFS shell command "-mv" does as well, you can check it in the source code. If you think about it, it's pretty logical, since when you move a file on the distributed file system, you don't really move any blocks of the file, you just update the "path" metadata of the file in the NameNode.

View solution in original post

12 REPLIES 12

avatar

@gnovak, I am still wondering why it has created the directory on my local machine? Kind of wired...

Related to this i have another issue, i am also reading files from hdfs directory using wholeTextFile() my hdfs input directory has text files and sub directories in it. On my local development machine i was able to read the files where wholeTextFile() was not considering sub directories, however whenever i deployed the same code cluster, it started to consider sub directories as well. Do you have any idea on this? Appreciate your help on this

avatar
Expert Contributor
@RAUI wholeTextFile() is not part of the HDFS API, I'm assuming you're using Spark, with which I'm not too familiar. I suggest you to post another question for this to HCC.

avatar
Explorer

@RAUI

Yes there is another way of achieving this. You can use the method copy() from the FileUtil class and pass your FileSystem object to it to effectively copy your files from the source HDFS location to the target. As with using rename() you will need to ensure you target directory is created before calling copy. FileUtil.copy() has a signature where you provide a source and destination FS and in this case you would provide the same FS object since you are looking to copy files to a different location on the same HDFS. There is also a boolean option to delete the source file after the copy if that fits your use case.

Here is a link to the FileUtil API: http://hadoop.apache.org/docs/r2.8.0/api/org/apache/hadoop/fs/FileUtil.html