<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Get files recursively from S3 bucket in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395643#M248991</link>
    <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/117603"&gt;@nifier&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What does your configuration for the FetchS3 processor look like?&lt;BR /&gt;&lt;BR /&gt;I would say make sure your pointing to the correct region in the FetchS3 processor and make sure your AWSCredentialsService or credentials in the processor are set correctly.&lt;/P&gt;</description>
    <pubDate>Mon, 21 Oct 2024 16:34:33 GMT</pubDate>
    <dc:creator>drewski7</dc:creator>
    <dc:date>2024-10-21T16:34:33Z</dc:date>
    <item>
      <title>Get files recursively from S3 bucket</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395630#M248987</link>
      <description>&lt;P&gt;Hi&lt;/P&gt;&lt;P&gt;Iam new to Nifi and trying to get this resolved.&lt;/P&gt;&lt;P&gt;we have a S3 bucket with below structure. &lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-10-21 at 7.47.21 PM.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42265i415EB9022FB65569/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-10-21 at 7.47.21 PM.png" alt="Screenshot 2024-10-21 at 7.47.21 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;and using below pattern to get the file recursively.&lt;/P&gt;&lt;P&gt;Though this works if the files are there in home directory of the bucket.&lt;/P&gt;&lt;P&gt;But it doesn't work for folder or subfolder.&lt;/P&gt;&lt;P&gt;Please let me know how to get the files recursively.&lt;/P&gt;&lt;P&gt;Note: Also the same structure has to be created in local with PutFile, which I think is possible once i get the files from FetchS3Object.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-10-21 at 7.48.15 PM.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42266i7D919660F602E13B/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-10-21 at 7.48.15 PM.png" alt="Screenshot 2024-10-21 at 7.48.15 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 21 Oct 2024 14:23:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395630#M248987</guid>
      <dc:creator>nifier</dc:creator>
      <dc:date>2024-10-21T14:23:33Z</dc:date>
    </item>
    <item>
      <title>Re: Get files recursively from S3 bucket</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395634#M248988</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/117603"&gt;@nifier&lt;/a&gt;&amp;nbsp;-&lt;/P&gt;&lt;P&gt;I emulated the same setup as you via a testing S3 bucket.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="drewski7_0-1729522787407.png" style="width: 574px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42267i064A9EE2CB95A9CC/image-dimensions/574x185?v=v2" width="574" height="185" role="button" title="drewski7_0-1729522787407.png" alt="drewski7_0-1729522787407.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;Using an access key id, and secret access key that had full access to all of S3, I was able to receive all the objects (including recursive ones) from that bucket.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="drewski7_1-1729522961080.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42268i02F59A9DCAECF0EE/image-size/medium?v=v2&amp;amp;px=400" role="button" title="drewski7_1-1729522961080.png" alt="drewski7_1-1729522961080.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="drewski7_2-1729523003012.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42269i63E9B1687658057E/image-size/medium?v=v2&amp;amp;px=400" role="button" title="drewski7_2-1729523003012.png" alt="drewski7_2-1729523003012.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;BR /&gt;A couple questions I want to follow up with ...&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;1. In your ListS3 processor configuration, do you have anything set for prefix or delimiter? Just want to make sure because that could be filtering some files/directories coming from S3.&lt;BR /&gt;&lt;BR /&gt;2. What is your IAM role and the bucket policy you are trying to consume from? Are you certain that the role you are using can access to all the objects in the bucket?&lt;/P&gt;</description>
      <pubDate>Mon, 21 Oct 2024 15:03:31 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395634#M248988</guid>
      <dc:creator>drewski7</dc:creator>
      <dc:date>2024-10-21T15:03:31Z</dc:date>
    </item>
    <item>
      <title>Re: Get files recursively from S3 bucket</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395636#M248989</link>
      <description>&lt;P&gt;Thank you for your reply.&lt;/P&gt;&lt;P&gt;1. In ListS3, Im able to list the files, no issues here.&lt;/P&gt;&lt;P&gt;Issue is with the next step of FetchS3Object which gives me below error.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-10-21 at 9.27.12 PM.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42270i145F7FEBDCCF4077/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-10-21 at 9.27.12 PM.png" alt="Screenshot 2024-10-21 at 9.27.12 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;ERROR&lt;BR /&gt;FetchS3Object[id=99843c65-eeb7-1140-824f-1258d088506d] Failed to retrieve S3 Object for FlowFile[filename=20242323/year/year.txt];&lt;BR /&gt;routing to failure: com.amazonaws.services.s3.model.AmazonS3Exception: null (Service: Amazon S3; Status Code: 404;&lt;BR /&gt;Error Code: NoSuchKey; Request ID: tx00000631cd5ef67d0d1fd-006716305c-c9f4b0-ttce-stage-singlesite-zone;&lt;BR /&gt;S3 Extended Request ID: c9f4b0-ttce-stage-singlesite-zone-ttce-stage-singlesite-zonegroup; Proxy: null),&lt;BR /&gt;S3 Extended Request ID: c9f4b0-ttce-stage-singlesite-zone-ttce-stage-singlesite-zonegroup&lt;/P&gt;&lt;P&gt;2. Yes, we do have access to the bucket. We are able to recursively get it from Shell script using AWS S3 commands.&lt;/P&gt;&lt;P&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 21 Oct 2024 15:58:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395636#M248989</guid>
      <dc:creator>nifier</dc:creator>
      <dc:date>2024-10-21T15:58:30Z</dc:date>
    </item>
    <item>
      <title>Re: Get files recursively from S3 bucket</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395643#M248991</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/117603"&gt;@nifier&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;What does your configuration for the FetchS3 processor look like?&lt;BR /&gt;&lt;BR /&gt;I would say make sure your pointing to the correct region in the FetchS3 processor and make sure your AWSCredentialsService or credentials in the processor are set correctly.&lt;/P&gt;</description>
      <pubDate>Mon, 21 Oct 2024 16:34:33 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395643#M248991</guid>
      <dc:creator>drewski7</dc:creator>
      <dc:date>2024-10-21T16:34:33Z</dc:date>
    </item>
    <item>
      <title>Re: Get files recursively from S3 bucket</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395647#M248992</link>
      <description>&lt;P&gt;Credentials and region are correct as I'm able to fetch the file if they are directly under home directory (20242323) of the bucket. The issue is fetching files that are under folder/subfolders like "month" or "year" in this case.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="nifier_0-1729530590518.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42278iC8AC4E64DCD95805/image-size/medium?v=v2&amp;amp;px=400" role="button" title="nifier_0-1729530590518.png" alt="nifier_0-1729530590518.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;ListS3 Properties&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-10-21 at 10.33.06 PM.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42276iD675BB445DEBBC01/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-10-21 at 10.33.06 PM.png" alt="Screenshot 2024-10-21 at 10.33.06 PM.png" /&gt;&lt;/span&gt; &lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;FetchS3Object Properties&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-10-21 at 10.32.39 PM.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42277i0FCC022C5B404A84/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-10-21 at 10.32.39 PM.png" alt="Screenshot 2024-10-21 at 10.32.39 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt; &lt;/P&gt;</description>
      <pubDate>Mon, 21 Oct 2024 17:12:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395647#M248992</guid>
      <dc:creator>nifier</dc:creator>
      <dc:date>2024-10-21T17:12:08Z</dc:date>
    </item>
    <item>
      <title>Re: Get files recursively from S3 bucket</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395653#M248993</link>
      <description>&lt;P&gt;For the record&lt;/P&gt;&lt;P&gt;Able to resolve the fetching issue, there was port number missing in the overridden URL.&lt;/P&gt;&lt;P&gt;However, now i get a different error when writing the file to local disk.&lt;/P&gt;&lt;P&gt;PutFile[id=07413ab5-d3d2-1e9a-99b0-c0f57682f17c] Penalizing FlowFile[filename=20242323/year/year.txt] and transferring to failure: org.apache.nifi.processor.exception.FlowFileAccessException: Failed to export StandardFlowFileRecord[uuid=35019b7d-1e33-44db-8432-6a54b2f6586e,claim=StandardContentClaim [resourceClaim=StandardResourceClaim[id=1729535188764-436, container=default, section=436], offset=70, length=16],offset=0,name=20242323/year/year.txt,size=16] to /apps/fex/shared/mina/archive/.20242323/year/year.txt due to java.io.FileNotFoundException: /apps/fex/shared/mina/archive/.20242323/year/year.txt (No such file or directory)&lt;BR /&gt;- Caused by: java.io.FileNotFoundException: /apps/fex/shared/mina/archive/.20242323/year/year.txt (No such file or directory)&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Oct 2024 18:32:43 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395653#M248993</guid>
      <dc:creator>nifier</dc:creator>
      <dc:date>2024-10-21T18:32:43Z</dc:date>
    </item>
    <item>
      <title>Re: Get files recursively from S3 bucket</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395659#M248994</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/117603"&gt;@nifier&lt;/a&gt;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;Your putFile issue is unrelated to original query in this community question.&amp;nbsp; It is better if you start a new community questioon for unrelated queries as solutions can become confusing to others who may use the thread in the future.&lt;BR /&gt;&lt;BR /&gt;That being said, this exception is cause because your NiFi&amp;nbsp; FlowFile has a filename that contains a directory structure:&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;20242323/year/year.txt&lt;/LI-CODE&gt;&lt;P&gt;This is not a valid filename to use with putFile processor.&amp;nbsp; I am not sure where in your dataflow before putFile that the filename FlowFile Attribute&amp;nbsp; is being modified in such a way.&amp;nbsp; &amp;nbsp;You might be able to address this issue there (preferred).&lt;BR /&gt;&lt;BR /&gt;You could use an update Attribute processor to extract the directory structure from the filename before putFile processor also.&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="MattWho_0-1729539444091.png" style="width: 714px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/42279i3757E88729716A68/image-dimensions/714x500?v=v2" width="714" height="500" role="button" title="MattWho_0-1729539444091.png" alt="MattWho_0-1729539444091.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;BR /&gt;if you want to maintain the append the extracted path from the filename to "Directory" configured in the putFile processor if you want to create that directory structure.&lt;/P&gt;&lt;P&gt;Please help our community thrive. If you found&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;STRONG&gt;any&lt;/STRONG&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;of the suggestions/solutions provided helped you with solving your issue or answering your question, please take a moment to login and click "&lt;SPAN&gt;&lt;EM&gt;&lt;STRONG&gt;&lt;FONT color="#FF0000"&gt;Accept as Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/EM&gt;" on&amp;nbsp;&lt;STRONG&gt;one or more&lt;/STRONG&gt;&amp;nbsp;of them that helped.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you,&lt;BR /&gt;Matt&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 21 Oct 2024 19:42:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395659#M248994</guid>
      <dc:creator>MattWho</dc:creator>
      <dc:date>2024-10-21T19:42:18Z</dc:date>
    </item>
    <item>
      <title>Re: Get files recursively from S3 bucket</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395750#M249014</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;SPAN&gt;Matt,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Closing this query as it's not related to S3 issue.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thanks for your response.&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Oct 2024 17:16:00 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395750#M249014</guid>
      <dc:creator>nifier</dc:creator>
      <dc:date>2024-10-22T17:16:00Z</dc:date>
    </item>
    <item>
      <title>Re: Get files recursively from S3 bucket</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395751#M249015</link>
      <description>&lt;P&gt;Thanks Matt,&amp;nbsp;&lt;/P&gt;&lt;P&gt;Was able to resolve the issue with your&amp;nbsp;&lt;SPAN&gt;putFile&amp;nbsp;solution.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Oct 2024 17:40:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Get-files-recursively-from-S3-bucket/m-p/395751#M249015</guid>
      <dc:creator>nifier</dc:creator>
      <dc:date>2024-10-22T17:40:23Z</dc:date>
    </item>
  </channel>
</rss>

