Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to uzip a file in Hadoop path(Inside zip file multiple files are there)...Without copying into local....

avatar
New Contributor

How to uzip a file in Hadoop path(Inside zip file multiple files are there)...Without copying into local....

1 REPLY 1

avatar

If the compressed file was of just one file, the Pig approach shown in https://stackoverflow.com/questions/34573279/how-to-unzip-gz-files-in-a-new-directory-in-hadoop might have been useful. No matter what you do, you'll have to do this in a single mapper from whatever data access framework you use this it won't be a parallelized job, but I understand your desire to save the time and network from the pull from HDFS and then put back in once extracted.

The Java Map/Reduce example at http://cutler.io/2012/07/hadoop-processing-zip-files-in-mapreduce/ is also assuming the compressed file is a single file, but maybe it could be a start for some custom work you might be able to do.

Good luck and happy Hadooping!