Support Questions

rgelhausen · ‎06-13-2016

jfrazee · ‎06-13-2016

@Randy Gelhausen In order to specify a path you should be able include it as part of the Object Key in the properties.

View solution in original post

jfrazee · ‎06-13-2016

@Randy Gelhausen In order to specify a path you should be able include it as part of the Object Key in the properties.

rgelhausen · ‎06-13-2016

Do I just use '/' as the directory separator?

jfrazee · ‎06-13-2016

Probably yes.

A mostly unimportant thing here is that since S3 is really just a key value store the path delimiters are only meaningful to the higher level APIs and tools; i.e., stuff that lets you do prefix listings. So it can be anything, but in reality though people almost never change it from the default '/'.

rgelhausen · ‎06-13-2016

Does that mean I should use separate buckets for different datasets? I've seen single buckets housing multiple different tables via "folders". Is that a bad idea?

jfrazee · ‎06-13-2016

Typically no. You're actually limited on the number of buckets you can create whereas number of objects, and thus prefixes, effectively not. The situation where you want different buckets is where you want to specify different bucket policies; e.g., for data lifecycle (+/- versioning, automatic archive to glacier), security, and environment (dev, test, prod).

The design of prefixes/key names/directories should then be guided by your access patterns with similar sorts of considerations you have for organizing data in HDFS. Listings over prefixes/recursive listings can be slow, so thinking along those terms, if you're going to do listings you'll want enough hierarchy or structure to your key names that those result sets don't get huge. If you're only ever going to do access to specific keys, this is less of an issue.

TerryP · ‎06-13-2016

@Randy Gelhausen

S3 does not have a directory hierarchy per se but S3 does allow the Object Keys to contain the "/" character. You are dealing with a key-value store and one object can have a key of /my/fake/directory/a/file and another can have a key value of /my/fake/directory/b/file. The objects are named similarly, and most tools that speak S3 will display the object as if they were files in a directory hierarchy, but there is no directory structure behind the objects. That is the key takeaway when dealing with S3. When you store, or retrieve, an object with S3 you have to reference the entire key for the object and the bucket that contains the key. The paradigm of directory and file are just an illusion.

Use the Object Key in the method call as @jfrazee said and you should be good to go.

Cloudera Community

Support Questions

How can I specify S3 bucket folders/paths with PutS3Object?

Get files recursively from S3 bucket

How to copy between a cluster and S3 buckets

HDP 2.4.0 and Spark 1.6.0 connecting to AWS S3 buc...

Trouble Connecting to Isilon S3 bucket w Impala an...

Fetch objects from an IBM Cloud S3 bucket using Ap...

Comparing Performance of Cloudera Operational Data...

How to access data files stored in AWS S3 buckets ...

access amazon S3 bucket from hdfs

Listing AWS S3 buckets

Error in PutS3Object