- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Get files recursively from S3 bucket
- Labels:
-
Apache NiFi
Created 10-21-2024 07:23 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi
Iam new to Nifi and trying to get this resolved.
we have a S3 bucket with below structure.
and using below pattern to get the file recursively.
Though this works if the files are there in home directory of the bucket.
But it doesn't work for folder or subfolder.
Please let me know how to get the files recursively.
Note: Also the same structure has to be created in local with PutFile, which I think is possible once i get the files from FetchS3Object.
Created 10-22-2024 10:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Matt,
Closing this query as it's not related to S3 issue.
Thanks for your response.
Created 10-21-2024 08:03 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@nifier -
I emulated the same setup as you via a testing S3 bucket.
Using an access key id, and secret access key that had full access to all of S3, I was able to receive all the objects (including recursive ones) from that bucket.
A couple questions I want to follow up with ...
1. In your ListS3 processor configuration, do you have anything set for prefix or delimiter? Just want to make sure because that could be filtering some files/directories coming from S3.
2. What is your IAM role and the bucket policy you are trying to consume from? Are you certain that the role you are using can access to all the objects in the bucket?
Created 10-21-2024 08:58 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thank you for your reply.
1. In ListS3, Im able to list the files, no issues here.
Issue is with the next step of FetchS3Object which gives me below error.
ERROR
FetchS3Object[id=99843c65-eeb7-1140-824f-1258d088506d] Failed to retrieve S3 Object for FlowFile[filename=20242323/year/year.txt];
routing to failure: com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 404;
Error Code: NoSuchKey; Request ID: tx00000631cd5ef67d0d1fd-006716305c-c9f4b0-ttce-stage-singlesite-zone;
S3 Extended Request ID: c9f4b0-ttce-stage-singlesite-zone-ttce-stage-singlesite-zonegroup; Proxy: null),
S3 Extended Request ID: c9f4b0-ttce-stage-singlesite-zone-ttce-stage-singlesite-zonegroup
2. Yes, we do have access to the bucket. We are able to recursively get it from Shell script using AWS S3 commands.
Created 10-21-2024 09:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
What does your configuration for the FetchS3 processor look like?
I would say make sure your pointing to the correct region in the FetchS3 processor and make sure your AWSCredentialsService or credentials in the processor are set correctly.
Created 10-21-2024 10:12 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Credentials and region are correct as I'm able to fetch the file if they are directly under home directory (20242323) of the bucket. The issue is fetching files that are under folder/subfolders like "month" or "year" in this case.
ListS3 Properties
FetchS3Object Properties
Created 10-21-2024 11:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For the record
Able to resolve the fetching issue, there was port number missing in the overridden URL.
However, now i get a different error when writing the file to local disk.
PutFile[id=07413ab5-d3d2-1e9a-99b0-c0f57682f17c] Penalizing FlowFile[filename=20242323/year/year.txt] and transferring to failure: org.apache.nifi.processor.exception.FlowFileAccessException: Failed to export StandardFlowFileRecord[uuid=35019b7d-1e33-44db-8432-6a54b2f6586e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1729535188764-436, container=default, section=436], offset=70, length=16],offset=0,name=20242323/year/year.txt,size=16] to /apps/fex/shared/mina/archive/.20242323/year/year.txt due to java.io.FileNotFoundException: /apps/fex/shared/mina/archive/.20242323/year/year.txt (No such file or directory)
- Caused by: java.io.FileNotFoundException: /apps/fex/shared/mina/archive/.20242323/year/year.txt (No such file or directory)
Created 10-21-2024 12:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@nifier
Your putFile issue is unrelated to original query in this community question. It is better if you start a new community questioon for unrelated queries as solutions can become confusing to others who may use the thread in the future.
That being said, this exception is cause because your NiFi FlowFile has a filename that contains a directory structure:
20242323/year/year.txt
This is not a valid filename to use with putFile processor. I am not sure where in your dataflow before putFile that the filename FlowFile Attribute is being modified in such a way. You might be able to address this issue there (preferred).
You could use an update Attribute processor to extract the directory structure from the filename before putFile processor also.
if you want to maintain the append the extracted path from the filename to "Directory" configured in the putFile processor if you want to create that directory structure.
Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 10-22-2024 10:16 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Matt,
Closing this query as it's not related to S3 issue.
Thanks for your response.
Created 10-22-2024 10:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks Matt,
Was able to resolve the issue with your putFile solution.
data:image/s3,"s3://crabby-images/23807/238076978439ad4595ffd9d0c56a404fd66e7f79" alt=""