Each of the URLs in the response is itself a text file payload.
For each file:
I want to download each record in the JSON array response into its own Flowfile for processing (so I'll need to prepend the URL I just hit to get this response since it is a relative path).
Each resulting Flowfile that is downloaded should be named based on the filename in the Content-Disposition header.
Each flowfile should have an attribute added that takes a substring of the file name (as resolved from the 2nd requirement) and add it as an attribute named blockId. For example: a filename of bazaz.txt that was downloaded would have a blockId:bazaz in its attributes.
So far I have this processor flow:
GetHttp: Download the metadata URL that points to the files.
SplitRecord or PartitionRecord?: Break up the response from #1 into different FlowFiles. These processors don't seem quite right since I want the response from #1 to dictate how many flowfiles get created based on the array of URLs returned in #1. The response of calling each URL from the response of #1 will be the content of each flowfile that gets generated.
UpdateAttribute: Set the blockId property based on the filename using expression language.
Things get complex when trying to use #1 as the basis for the input flowfiles. I'm new to NiFi so any help with which processors to use and how the flow should be setup is much appreciated.
Configure SplitRecord processor's Record Reader/Write controller services and Records per split as 1,Now you are going to have each line will be in new flowfile.
3.ExtractText //extract the content of flowfile and keep as attribute
4.UpdateAttribute //Use nifi expression language to the attribute + advanced usage to make decision.
Refer to this link for more details regards to the usage of ExtractText and UpdateAttribute processors.