Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Guru

The ListS3 and FetchS3 processors in Apache NiFi are commonly used to retrieve objects from Amazon S3 buckets, but they can be easily configured to retrieve objects from IBM Cloud buckets.

 

Assume, I have an IBM Cloud bucket that contains three CSV files:

 

IBM_Cloud_Bucket_objects.png

 

First, get the following from your IBM Cloud bucket configuration :

  • Bucket Name
  • Private Endpoint

ibm_bucket_configuration.png

 

Then, from the Service Credentials of your Cloud Object Storage, get:

  • Access Key ID
  • Secret Access Key

ibm_cloud_service_credentials.png

 Note: If you don't have Service Credentials for the storage instance, create a new one with HMAC set to "true" (https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-uhc-hmac-credentials-main)

 

Create or confirm that your IBM Cloud user has the necessary Bucket Access Policy to view and download objects (https://cloud.ibm.com/docs/cloud-object-storage?topic=cloud-object-storage-iam-bucket-permissions) :

 

bucket_access_policy.png

 

With this setup information confirmed, add to and connect ListS3 and FetchS3 processors on your NiFi canvas, similar to the following:

 

lists3_fetchs3_nifi_canvas.png

 

In the List S3 configuration, enter the Bucket, Access Key ID, Secret Access Key and Endpoint URL:

 

ListS3_configuration_1.png

ListS3_configuration_2.png

Note: The Region property is ignored when the Endpoint Override URL property is used.

 

Run the ListS3 processor and you will see a FlowFile generated for each of the bucket objects:

 

lists3_running.png

 

Looking at the queue details:

 

lists3_queue_details.png

 

Now configure the FetchS3 similarly with the Bucket Name, Access Key ID, Secret Access Key and Endpoint Override URL:

 

FetchS3_configuration.png

 

Run the FetchS3 processor and the three CSV files are retrieved from the IBM Cloud Bucket:

 

FetchS3_running.png

1,573 Views
0 Kudos