Created 10-21-2024 07:23 AM
Hi
Iam new to Nifi and trying to get this resolved.
we have a S3 bucket with below structure.
and using below pattern to get the file recursively.
Though this works if the files are there in home directory of the bucket.
But it doesn't work for folder or subfolder.
Please let me know how to get the files recursively.
Note: Also the same structure has to be created in local with PutFile, which I think is possible once i get the files from FetchS3Object.
Created 10-22-2024 10:16 AM
Hi Matt,
Closing this query as it's not related to S3 issue.
Thanks for your response.
Created 10-21-2024 08:03 AM
@nifier -
I emulated the same setup as you via a testing S3 bucket.
Using an access key id, and secret access key that had full access to all of S3, I was able to receive all the objects (including recursive ones) from that bucket.
A couple questions I want to follow up with ...
1. In your ListS3 processor configuration, do you have anything set for prefix or delimiter? Just want to make sure because that could be filtering some files/directories coming from S3.
2. What is your IAM role and the bucket policy you are trying to consume from? Are you certain that the role you are using can access to all the objects in the bucket?
Created 10-21-2024 08:58 AM
Thank you for your reply.
1. In ListS3, Im able to list the files, no issues here.
Issue is with the next step of FetchS3Object which gives me below error.
ERROR
FetchS3Object[id=99843c65-eeb7-1140-824f-1258d088506d] Failed to retrieve S3 Object for FlowFile[filename=20242323/year/year.txt];
routing to failure: com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 404;
Error Code: NoSuchKey; Request ID: tx00000631cd5ef67d0d1fd-006716305c-c9f4b0-ttce-stage-singlesite-zone;
S3 Extended Request ID: c9f4b0-ttce-stage-singlesite-zone-ttce-stage-singlesite-zonegroup; Proxy: null),
S3 Extended Request ID: c9f4b0-ttce-stage-singlesite-zone-ttce-stage-singlesite-zonegroup
2. Yes, we do have access to the bucket. We are able to recursively get it from Shell script using AWS S3 commands.
Created 10-21-2024 09:34 AM
What does your configuration for the FetchS3 processor look like?
I would say make sure your pointing to the correct region in the FetchS3 processor and make sure your AWSCredentialsService or credentials in the processor are set correctly.
Created 10-21-2024 10:12 AM
Credentials and region are correct as I'm able to fetch the file if they are directly under home directory (20242323) of the bucket. The issue is fetching files that are under folder/subfolders like "month" or "year" in this case.
ListS3 Properties
FetchS3Object Properties
Created 10-21-2024 11:32 AM
For the record
Able to resolve the fetching issue, there was port number missing in the overridden URL.
However, now i get a different error when writing the file to local disk.
PutFile[id=07413ab5-d3d2-1e9a-99b0-c0f57682f17c] Penalizing FlowFile[filename=20242323/year/year.txt] and transferring to failure: org.apache.nifi.processor.exception.FlowFileAccessException: Failed to export StandardFlowFileRecord[uuid=35019b7d-1e33-44db-8432-6a54b2f6586e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1729535188764-436, container=default, section=436], offset=70, length=16],offset=0,name=20242323/year/year.txt,size=16] to /apps/fex/shared/mina/archive/.20242323/year/year.txt due to java.io.FileNotFoundException: /apps/fex/shared/mina/archive/.20242323/year/year.txt (No such file or directory)
- Caused by: java.io.FileNotFoundException: /apps/fex/shared/mina/archive/.20242323/year/year.txt (No such file or directory)
Created 10-21-2024 12:42 PM
@nifier
Your putFile issue is unrelated to original query in this community question. It is better if you start a new community questioon for unrelated queries as solutions can become confusing to others who may use the thread in the future.
That being said, this exception is cause because your NiFi FlowFile has a filename that contains a directory structure:
20242323/year/year.txt
This is not a valid filename to use with putFile processor. I am not sure where in your dataflow before putFile that the filename FlowFile Attribute is being modified in such a way. You might be able to address this issue there (preferred).
You could use an update Attribute processor to extract the directory structure from the filename before putFile processor also.
if you want to maintain the append the extracted path from the filename to "Directory" configured in the putFile processor if you want to create that directory structure.
Please help our community thrive. If you found any of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "Accept as Solution" on one or more of them that helped.
Thank you,
Matt
Created 10-22-2024 10:16 AM
Hi Matt,
Closing this query as it's not related to S3 issue.
Thanks for your response.
Created 10-22-2024 10:40 AM
Thanks Matt,
Was able to resolve the issue with your putFile solution.