Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

ListS3 Processor includes "parent" path as a flow file

ListS3 Processor includes "parent" path as a flow file

Super Collaborator

Suppose I have some data in s3:

s3://my_bucket/my_path/to/my/data/myfile.txt

And suppose I use a ListS3 processor with the bucket and pass "my_path/to/my/data/" as the prefix

I will get TWO flow files:

"s3://my_bucket/my_path/to/my/data/myfile.txt"

and

"s3://my_bucket/my_path/to/my/data/"

even though the latter is just a partial key that doesn't represent an object.

How can I tune my settings to only get the entry for "myfile.txt"?

Thanks in advance!

2 REPLIES 2

Re: ListS3 Processor includes "parent" path as a flow file

What happens when you pass that to FetchS3Object? My first thought here is that ListS3 should not be producing output flowfiles for anything other than retrievable objects/files and if it is then it is either a bug or a mode that should be supported so that the directories/buckets themselves aren't listed but rather only their content.

Re: ListS3 Processor includes "parent" path as a flow file

Super Collaborator

Agreed!

It certainly appears to be a bug.