- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to uzip a file in Hadoop path(Inside zip file multiple files are there)...Without copying into local....
- Labels:
-
Apache Hadoop
Created ‎03-01-2019 02:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
How to uzip a file in Hadoop path(Inside zip file multiple files are there)...Without copying into local....
Created ‎03-06-2019 11:01 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If the compressed file was of just one file, the Pig approach shown in https://stackoverflow.com/questions/34573279/how-to-unzip-gz-files-in-a-new-directory-in-hadoop might have been useful. No matter what you do, you'll have to do this in a single mapper from whatever data access framework you use this it won't be a parallelized job, but I understand your desire to save the time and network from the pull from HDFS and then put back in once extracted.
The Java Map/Reduce example at http://cutler.io/2012/07/hadoop-processing-zip-files-in-mapreduce/ is also assuming the compressed file is a single file, but maybe it could be a start for some custom work you might be able to do.
Good luck and happy Hadooping!
