Created 02-15-2016 09:11 AM
Hi,
I have created a folder in the path /home/usr/Desktop/myFolder. "my Folder" has two files .1. a.txt and 2. b.txt. Now, I compress this to myFolder.tar.gz. I want to copy this compressed "myFolder.tar.gz" to my HDFS Location for processing. What will be command for this ? I tried the below and getting error. tar zxvf /home/usr/Desktop/myFolder.tar.gz myFolder/a.txt myFolder/b.txt -O | hadoop fs -put /myHDFSFolder/mergedFile.txt Could anyone please suggest solution for this?Created 02-15-2016 09:29 AM
Hi Rushikesh,
The command used to compress the file to tar.gz should be tar -zcvf to extract you can use tar -zxvf
If you wish to move the compressed file you can use the command - hadoop fs -put /myHDFSFolder/myFolder.tar.gz
The complete command would be:
SYNTAX:
tar zcvf <compressed file name>.tar.gz <source file 1> <source file 2> | hadoop fs -put <source file> <destination location>
tar zcvf myFolder.tar.gz a.txt b.txt | hadoop fs -put myFolder.tar.gz /myHDFSFolder/
Hope this help!
Created 02-15-2016 09:20 AM
Just do hdfs dfs -copyFromLocal myFolder.tar.gz /hdfs/destination/path
Created 02-20-2016 01:43 PM
@Rahul Pathak, thanks for sharing this info.
Created 02-15-2016 09:29 AM
Hi Rushikesh,
The command used to compress the file to tar.gz should be tar -zcvf to extract you can use tar -zxvf
If you wish to move the compressed file you can use the command - hadoop fs -put /myHDFSFolder/myFolder.tar.gz
The complete command would be:
SYNTAX:
tar zcvf <compressed file name>.tar.gz <source file 1> <source file 2> | hadoop fs -put <source file> <destination location>
tar zcvf myFolder.tar.gz a.txt b.txt | hadoop fs -put myFolder.tar.gz /myHDFSFolder/
Hope this help!
Created 02-15-2016 09:38 AM
@kgopal thanks, for sharing this information. I got required information, thus accepting best answer.
Created 01-27-2020 12:16 PM
Thanks for the information. In using this command, it did cause some serious performance degradation when writing to HDFS. Every 128MB block would take about 20-30 secs to write to HDFS. The issue had to do with trying to compress the tar file. It's better to remove the "z" flag in tar and not compress.
Just to provide some numbers, writing almost 1TB of data from local disk to HDFS would take 13+ hours with compression (z) and it would actually eventually fail due to kerberos ticket expiration. Removing the "z" flag, the copy to HDFS took less than an hour for the same 1TB of data!