Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How can I specify S3 bucket folders/paths with PutS3Object?

Solved Go to solution

How can I specify S3 bucket folders/paths with PutS3Object?

 
1 ACCEPTED SOLUTION

Accepted Solutions

Re: How can I specify S3 bucket folders/paths with PutS3Object?

@Randy Gelhausen In order to specify a path you should be able include it as part of the Object Key in the properties.

6 REPLIES 6

Re: How can I specify S3 bucket folders/paths with PutS3Object?

@Randy Gelhausen In order to specify a path you should be able include it as part of the Object Key in the properties.

Re: How can I specify S3 bucket folders/paths with PutS3Object?

Do I just use '/' as the directory separator?

Re: How can I specify S3 bucket folders/paths with PutS3Object?

Probably yes.

A mostly unimportant thing here is that since S3 is really just a key value store the path delimiters are only meaningful to the higher level APIs and tools; i.e., stuff that lets you do prefix listings. So it can be anything, but in reality though people almost never change it from the default '/'.

Re: How can I specify S3 bucket folders/paths with PutS3Object?

Does that mean I should use separate buckets for different datasets? I've seen single buckets housing multiple different tables via "folders". Is that a bad idea?

Re: How can I specify S3 bucket folders/paths with PutS3Object?

Typically no. You're actually limited on the number of buckets you can create whereas number of objects, and thus prefixes, effectively not. The situation where you want different buckets is where you want to specify different bucket policies; e.g., for data lifecycle (+/- versioning, automatic archive to glacier), security, and environment (dev, test, prod).

The design of prefixes/key names/directories should then be guided by your access patterns with similar sorts of considerations you have for organizing data in HDFS. Listings over prefixes/recursive listings can be slow, so thinking along those terms, if you're going to do listings you'll want enough hierarchy or structure to your key names that those result sets don't get huge. If you're only ever going to do access to specific keys, this is less of an issue.

Re: How can I specify S3 bucket folders/paths with PutS3Object?

Expert Contributor

@Randy Gelhausen

S3 does not have a directory hierarchy per se but S3 does allow the Object Keys to contain the "/" character. You are dealing with a key-value store and one object can have a key of /my/fake/directory/a/file and another can have a key value of /my/fake/directory/b/file. The objects are named similarly, and most tools that speak S3 will display the object as if they were files in a directory hierarchy, but there is no directory structure behind the objects. That is the key takeaway when dealing with S3. When you store, or retrieve, an object with S3 you have to reference the entire key for the object and the bucket that contains the key. The paradigm of directory and file are just an illusion.

Use the Object Key in the method call as @jfrazee said and you should be good to go.