- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Move file from one HDFS directoy to another using scala/java
- Labels:
-
Apache Hadoop
-
Apache Spark
Created ‎05-23-2018 07:41 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I've files in one hdfs folder and after checking few things i wanted to move that file to another directory on hdfs.
Currently i am using filesystem object with rename it is doing the job but it is actually renaming the file with complete path.
Do have any other way to do it?
Appriciate your help.
Thanks,
Created ‎05-24-2018 07:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The answer is no. Renaming is the way to move files on HDFS: FileSystem.rename(). Actually, this is exactly what the HDFS shell command "-mv" does as well, you can check it in the source code. If you think about it, it's pretty logical, since when you move a file on the distributed file system, you don't really move any blocks of the file, you just update the "path" metadata of the file in the NameNode.
Created ‎05-23-2018 01:56 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Please can you give a concrete example of what you intend to do because someone cannot conceptualize with your explanation
Created ‎05-24-2018 06:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Geoffrey Shelton Okot, I have few files in hdfs directory. Simply wanted to move files from one hdfs directory to another.
For example: Have file abc.txt in pqr directory wanted to move file to lmn directory.
/apps/pqr/abc.txt move abc.txt to /apps/lmn/abc.txt
Created ‎05-24-2018 07:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To copy files between HDFS directories you need to have the correct permissions i.e in your example /apps/pqr/abc.txt move abc.txt to /apps/lmn/abc.txt.
I assume the HDFS directory owners as pqr and lmn respectively where the former has to have write permission to /apps/lmn/ else you run the copy command ad the HDFS superuser hdfs and then change the file permissions like demonstrated below.
Switch to hfds users
# su - hdfs
Now copy the abc.txt from source to destination
$ hdfs dfs -cp /apps/pqr/abc.txt /apps/lmn/
check the permissions see the example
$ hdfs dfs -ls /apps/lmn Found 3 items drwxr-xr-x+ - lmn hdfs 0 2018-05-24 00:40 /user/lmn/acls drwxr-xr-x+ - hdfs hdfs 0 2018-05-24 00:40 /user/lmn/abc.txt -rw-r--r-- 3 lmn hdfs 642 2018-05-24 08:45 /user/lmn/derby.log
Change the file permissions recursively for the directory, this should also change the ownership of abc.txt
$ hdfs dfs -chown -R lmn /apps/lmn
I hope that helps
Created ‎05-25-2018 09:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Geoffrey Shelton Okot, Thanks for your time but i was not looking for command line option(knows everyone).
Created ‎05-24-2018 07:55 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The answer is no. Renaming is the way to move files on HDFS: FileSystem.rename(). Actually, this is exactly what the HDFS shell command "-mv" does as well, you can check it in the source code. If you think about it, it's pretty logical, since when you move a file on the distributed file system, you don't really move any blocks of the file, you just update the "path" metadata of the file in the NameNode.
Created ‎05-25-2018 09:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@gnovak Thanks for getting my Question correctly. and the same has been done by me in my scala code. However thought to have others opinion on this.
Created ‎05-31-2018 02:28 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@gnovak, In order to satisfy my need i am doing FileSystem.rename(src,tgt). If target path is not exists will it create?
My understanding is, it will create the target path, however in my case i am able to move file as expected on my local machine and the same code has been deployed on cluster but i am able to move file to desired location. It is not giving me any exception but simply not doing the job.
Created ‎05-31-2018 02:48 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@RAUI No, it won't create it, the target directory must exist. However, if the target directory doesn't exist, it won't throw an exception, it will only indicate the error via the return value (as described in the documentation).
So 1) you should create the target directory before you call rename() and 2) you should check the return value, like this:
fs.mkdirs(new Path("/your/target/path")); boolean result = fs.rename( new Path("/your/source/path/your.file"), new Path("/your/target/path/your.file")); if (!result) { ... }
Created ‎05-31-2018 05:39 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@gnovak thanks for you time 🙂
