Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unzip a zipped archive on the fly while copying from S3

Unzip a zipped archive on the fly while copying from S3

New Contributor

Hi,

 

I have a requirement of copying a zipped archive (.tar.gz) from Amazon S3 into HDFS and in the process uncompress it to create files and sub-directories as contained within the compressed archive. I have the option of keeping the compressed files either as .zip or .tar.gz on S3. 

 

What is the best way that this can be achieved considering that I would like to avoid multiple hops? I have seen tools such as s3distcp but none of them seem to handle archives, but only individual compressed files. Any help would be appreciated.

1 REPLY 1
Highlighted

Re: Unzip a zipped archive on the fly while copying from S3

Master Guru
There's no existing tools am aware of (at least in the Apache project space) that can do this for you today. You'll likely have to look for some 3rd party existence of this, or build your own utility that can read the archive out of the source while also transforming them into direct files for the destination.