Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

ListS3 Processor includes "parent" path as a flow file

Super Collaborator

Suppose I have some data in s3:

s3://my_bucket/my_path/to/my/data/myfile.txt

And suppose I use a ListS3 processor with the bucket and pass "my_path/to/my/data/" as the prefix

I will get TWO flow files:

"s3://my_bucket/my_path/to/my/data/myfile.txt"

and

"s3://my_bucket/my_path/to/my/data/"

even though the latter is just a partial key that doesn't represent an object.

How can I tune my settings to only get the entry for "myfile.txt"?

Thanks in advance!

2 REPLIES 2

What happens when you pass that to FetchS3Object? My first thought here is that ListS3 should not be producing output flowfiles for anything other than retrievable objects/files and if it is then it is either a bug or a mode that should be supported so that the directories/buckets themselves aren't listed but rather only their content.

Super Collaborator

Agreed!

It certainly appears to be a bug.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.