Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to put a compressed folder into HDFS?

avatar

Hi,

I have created a folder in the path /home/usr/Desktop/myFolder. "my Folder" has two files .1. a.txt and 2. b.txt. Now, I compress this to myFolder.tar.gz. I want to copy this compressed "myFolder.tar.gz" to my HDFS Location for processing. What will be command for this ? I tried the below and getting error. tar zxvf /home/usr/Desktop/myFolder.tar.gz myFolder/a.txt myFolder/b.txt -O | hadoop fs -put /myHDFSFolder/mergedFile.txt Could anyone please suggest solution for this?
1 ACCEPTED SOLUTION

avatar
Rising Star
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
5 REPLIES 5

avatar
Super Collaborator

Just do hdfs dfs -copyFromLocal myFolder.tar.gz /hdfs/destination/path

avatar

@Rahul Pathak, thanks for sharing this info.

avatar
Rising Star
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar

@kgopal thanks, for sharing this information. I got required information, thus accepting best answer.

avatar
New Contributor

Thanks for the information. In using this command, it did cause some serious performance degradation when writing to HDFS. Every 128MB block would take about 20-30 secs to write to HDFS. The issue had to do with trying to compress the tar file. It's better to remove the "z" flag in tar and not compress.

 

Just to provide some numbers, writing almost 1TB of data from local disk to HDFS would take 13+ hours with compression (z) and it would actually eventually fail due to kerberos ticket expiration. Removing the "z" flag, the copy to HDFS took less than an hour for the same 1TB of data!