Reply
New Contributor
Posts: 3
Registered: ‎05-19-2014

Unzip a zipped archive on the fly while copying from S3

[ Edited ]

Hi,

 

I have a requirement of copying a zipped archive (.tar.gz) from Amazon S3 into HDFS and in the process uncompress it to create files and sub-directories as contained within the compressed archive. I have the option of keeping the compressed files either as .zip or .tar.gz on S3. 

 

What is the best way that this can be achieved considering that I would like to avoid multiple hops? I have seen tools such as s3distcp but none of them seem to handle archives, but only individual compressed files. Any help would be appreciated.

Highlighted
Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: Unzip a zipped archive on the fly while copying from S3

There's no existing tools am aware of (at least in the Apache project space) that can do this for you today. You'll likely have to look for some 3rd party existence of this, or build your own utility that can read the archive out of the source while also transforming them into direct files for the destination.