Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to uzip a file in Hadoop path(Inside zip file multiple files are there)...Without copying into local....

How to uzip a file in Hadoop path(Inside zip file multiple files are there)...Without copying into local....

New Contributor

How to uzip a file in Hadoop path(Inside zip file multiple files are there)...Without copying into local....

1 REPLY 1
Highlighted

Re: How to uzip a file in Hadoop path(Inside zip file multiple files are there)...Without copying into local....

If the compressed file was of just one file, the Pig approach shown in https://stackoverflow.com/questions/34573279/how-to-unzip-gz-files-in-a-new-directory-in-hadoop might have been useful. No matter what you do, you'll have to do this in a single mapper from whatever data access framework you use this it won't be a parallelized job, but I understand your desire to save the time and network from the pull from HDFS and then put back in once extracted.

The Java Map/Reduce example at http://cutler.io/2012/07/hadoop-processing-zip-files-in-mapreduce/ is also assuming the compressed file is a single file, but maybe it could be a start for some custom work you might be able to do.

Good luck and happy Hadooping!