Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Turn the response of a URL into multiple flow files

Turn the response of a URL into multiple flow files

New Contributor

I have a URL I hit that returns a JSON payload like this:

[
	"\/en\/download-data\/546457547?token=ABCDEFGHIJKL123456",
	"\/en\/download-data\/34543534?token=ABCDEFGHIJKL123456",
	"\/en\/download-data\/1423422?token=ABCDEFGHIJKL123456",
	"\/en\/download-data\/97534444?token=ABCDEFGHIJKL123456"
]

Each of the URLs in the response is itself a text file payload.

For each file:

  1. I want to download each record in the JSON array response into its own Flowfile for processing (so I'll need to prepend the URL I just hit to get this response since it is a relative path).
  2. Each resulting Flowfile that is downloaded should be named based on the filename in the Content-Disposition header.
  3. Each flowfile should have an attribute added that takes a substring of the file name (as resolved from the 2nd requirement) and add it as an attribute named blockId. For example: a filename of bazaz.txt that was downloaded would have a blockId:bazaz in its attributes.

So far I have this processor flow:

  1. GetHttp: Download the metadata URL that points to the files.
  2. SplitRecord or PartitionRecord?: Break up the response from #1 into different FlowFiles. These processors don't seem quite right since I want the response from #1 to dictate how many flowfiles get created based on the array of URLs returned in #1. The response of calling each URL from the response of #1 will be the content of each flowfile that gets generated.
  3. UpdateAttribute: Set the blockId property based on the filename using expression language.

Things get complex when trying to use #1 as the basis for the input flowfiles. I'm new to NiFi so any help with which processors to use and how the flow should be setup is much appreciated.

1 REPLY 1

Re: Turn the response of a URL into multiple flow files

Super Guru
@John Perkins

Configure SplitRecord processor's Record Reader/Write controller services and Records per split as 1,Now you are going to have each line will be in new flowfile.

Flow:

1.GetHTTP
2.SplitRecord
3.ExtractText //extract the content of flowfile and keep as attribute
4.UpdateAttribute //Use nifi expression language to the attribute + advanced usage to make decision.
--other processors

Refer to this link for more details regards to the usage of ExtractText and UpdateAttribute processors.