Support Questions

krajkumar14 · ‎03-01-2019

How to uzip a file in Hadoop path(Inside zip file multiple files are there)...Without copying into local....

LesterMartin · ‎03-06-2019

If the compressed file was of just one file, the Pig approach shown in https://stackoverflow.com/questions/34573279/how-to-unzip-gz-files-in-a-new-directory-in-hadoop might have been useful. No matter what you do, you'll have to do this in a single mapper from whatever data access framework you use this it won't be a parallelized job, but I understand your desire to save the time and network from the pull from HDFS and then put back in once extracted.

The Java Map/Reduce example at http://cutler.io/2012/07/hadoop-processing-zip-files-in-mapreduce/ is also assuming the compressed file is a single file, but maybe it could be a start for some custom work you might be able to do.

Good luck and happy Hadooping!

Cloudera Community

Support Questions

How to uzip a file in Hadoop path(Inside zip file multiple files are there)...Without copying into local....

Decompressing nested ZIP files in NiFi

Uploading Files for Cloudera Support - alternate m...

MergeRecord generates multiple files

How to copy files from HDFS recursive to the local...

How to identify in cdp cluster having small files ...

Small file in hadoop

Performance diff between single big file vs multip...

ConvertRecord fails for some files

Converting a Large JSON File into CSV

How to merge multiple HDFS files using Nifi Proces...