Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How can I specify S3 bucket folders/paths with PutS3Object?


@Randy Gelhausen In order to specify a path you should be able include it as part of the Object Key in the properties.

View solution in original post


@Randy Gelhausen In order to specify a path you should be able include it as part of the Object Key in the properties.

Do I just use '/' as the directory separator?

Probably yes.

A mostly unimportant thing here is that since S3 is really just a key value store the path delimiters are only meaningful to the higher level APIs and tools; i.e., stuff that lets you do prefix listings. So it can be anything, but in reality though people almost never change it from the default '/'.

Does that mean I should use separate buckets for different datasets? I've seen single buckets housing multiple different tables via "folders". Is that a bad idea?

Typically no. You're actually limited on the number of buckets you can create whereas number of objects, and thus prefixes, effectively not. The situation where you want different buckets is where you want to specify different bucket policies; e.g., for data lifecycle (+/- versioning, automatic archive to glacier), security, and environment (dev, test, prod).

The design of prefixes/key names/directories should then be guided by your access patterns with similar sorts of considerations you have for organizing data in HDFS. Listings over prefixes/recursive listings can be slow, so thinking along those terms, if you're going to do listings you'll want enough hierarchy or structure to your key names that those result sets don't get huge. If you're only ever going to do access to specific keys, this is less of an issue.

Expert Contributor

@Randy Gelhausen

S3 does not have a directory hierarchy per se but S3 does allow the Object Keys to contain the "/" character. You are dealing with a key-value store and one object can have a key of /my/fake/directory/a/file and another can have a key value of /my/fake/directory/b/file. The objects are named similarly, and most tools that speak S3 will display the object as if they were files in a directory hierarchy, but there is no directory structure behind the objects. That is the key takeaway when dealing with S3. When you store, or retrieve, an object with S3 you have to reference the entire key for the object and the bucket that contains the key. The paradigm of directory and file are just an illusion.

Use the Object Key in the method call as @jfrazee said and you should be good to go.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.