- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to put a compressed folder into HDFS?
- Labels:
-
Apache Hadoop
Created ‎02-15-2016 09:11 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I have created a folder in the path /home/usr/Desktop/myFolder. "my Folder" has two files .1. a.txt and 2. b.txt. Now, I compress this to myFolder.tar.gz. I want to copy this compressed "myFolder.tar.gz" to my HDFS Location for processing. What will be command for this ? I tried the below and getting error. tar zxvf /home/usr/Desktop/myFolder.tar.gz myFolder/a.txt myFolder/b.txt -O | hadoop fs -put /myHDFSFolder/mergedFile.txt Could anyone please suggest solution for this?Created ‎02-15-2016 09:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rushikesh,
The command used to compress the file to tar.gz should be tar -zcvf to extract you can use tar -zxvf
If you wish to move the compressed file you can use the command - hadoop fs -put /myHDFSFolder/myFolder.tar.gz
The complete command would be:
SYNTAX:
tar zcvf <compressed file name>.tar.gz <source file 1> <source file 2> | hadoop fs -put <source file> <destination location>
tar zcvf myFolder.tar.gz a.txt b.txt | hadoop fs -put myFolder.tar.gz /myHDFSFolder/
Hope this help!
Created ‎02-15-2016 09:20 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Just do hdfs dfs -copyFromLocal myFolder.tar.gz /hdfs/destination/path
Created ‎02-20-2016 01:43 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Rahul Pathak, thanks for sharing this info.
Created ‎02-15-2016 09:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Rushikesh,
The command used to compress the file to tar.gz should be tar -zcvf to extract you can use tar -zxvf
If you wish to move the compressed file you can use the command - hadoop fs -put /myHDFSFolder/myFolder.tar.gz
The complete command would be:
SYNTAX:
tar zcvf <compressed file name>.tar.gz <source file 1> <source file 2> | hadoop fs -put <source file> <destination location>
tar zcvf myFolder.tar.gz a.txt b.txt | hadoop fs -put myFolder.tar.gz /myHDFSFolder/
Hope this help!
Created ‎02-15-2016 09:38 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@kgopal thanks, for sharing this information. I got required information, thus accepting best answer.
Created ‎01-27-2020 12:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the information. In using this command, it did cause some serious performance degradation when writing to HDFS. Every 128MB block would take about 20-30 secs to write to HDFS. The issue had to do with trying to compress the tar file. It's better to remove the "z" flag in tar and not compress.
Just to provide some numbers, writing almost 1TB of data from local disk to HDFS would take 13+ hours with compression (z) and it would actually eventually fail due to kerberos ticket expiration. Removing the "z" flag, the copy to HDFS took less than an hour for the same 1TB of data!
