Support Questions

Find answers, ask questions, and share your expertise

How to put a compressed folder into HDFS?

Hi,

I have created a folder in the path /home/usr/Desktop/myFolder. "my Folder" has two files .1. a.txt and 2. b.txt. Now, I compress this to myFolder.tar.gz. I want to copy this compressed "myFolder.tar.gz" to my HDFS Location for processing. What will be command for this ? I tried the below and getting error. tar zxvf /home/usr/Desktop/myFolder.tar.gz myFolder/a.txt myFolder/b.txt -O | hadoop fs -put /myHDFSFolder/mergedFile.txt Could anyone please suggest solution for this?
1 ACCEPTED SOLUTION

Contributor

Hi Rushikesh,

The command used to compress the file to tar.gz should be tar -zcvf to extract you can use tar -zxvf

If you wish to move the compressed file you can use the command - hadoop fs -put /myHDFSFolder/myFolder.tar.gz

The complete command would be:

SYNTAX:

tar zcvf <compressed file name>.tar.gz <source file 1> <source file 2> | hadoop fs -put <source file> <destination location>

tar zcvf myFolder.tar.gz a.txt b.txt | hadoop fs -put myFolder.tar.gz /myHDFSFolder/

Hope this help!

View solution in original post

5 REPLIES 5

Expert Contributor

Just do hdfs dfs -copyFromLocal myFolder.tar.gz /hdfs/destination/path

@Rahul Pathak, thanks for sharing this info.

Contributor

Hi Rushikesh,

The command used to compress the file to tar.gz should be tar -zcvf to extract you can use tar -zxvf

If you wish to move the compressed file you can use the command - hadoop fs -put /myHDFSFolder/myFolder.tar.gz

The complete command would be:

SYNTAX:

tar zcvf <compressed file name>.tar.gz <source file 1> <source file 2> | hadoop fs -put <source file> <destination location>

tar zcvf myFolder.tar.gz a.txt b.txt | hadoop fs -put myFolder.tar.gz /myHDFSFolder/

Hope this help!

@kgopal thanks, for sharing this information. I got required information, thus accepting best answer.

New Contributor

Thanks for the information. In using this command, it did cause some serious performance degradation when writing to HDFS. Every 128MB block would take about 20-30 secs to write to HDFS. The issue had to do with trying to compress the tar file. It's better to remove the "z" flag in tar and not compress.

 

Just to provide some numbers, writing almost 1TB of data from local disk to HDFS would take 13+ hours with compression (z) and it would actually eventually fail due to kerberos ticket expiration. Removing the "z" flag, the copy to HDFS took less than an hour for the same 1TB of data!