Member since
01-14-2019
2
Posts
0
Kudos Received
0
Solutions
01-14-2019
08:28 AM
I have a URL I hit that returns a JSON payload like this: [
"\/en\/download-data\/546457547?token=ABCDEFGHIJKL123456",
"\/en\/download-data\/34543534?token=ABCDEFGHIJKL123456",
"\/en\/download-data\/1423422?token=ABCDEFGHIJKL123456",
"\/en\/download-data\/97534444?token=ABCDEFGHIJKL123456"
] Each of the URLs in the response is itself a text file payload. For each file: I want to download each record in the JSON array response into its own Flowfile for processing (so I'll need to prepend the URL I just hit to get this response since it is a relative path). Each resulting Flowfile that is downloaded should be named based on the filename in the Content-Disposition header. Each flowfile should have an attribute added that takes a substring of the file name (as resolved from the 2nd requirement) and add it as an attribute named blockId. For example: a filename of bazaz.txt that was downloaded would have a blockId:bazaz in its attributes. So far I have this processor flow: GetHttp: Download the metadata URL that points to the files. SplitRecord or PartitionRecord?: Break up the response from #1 into different FlowFiles. These processors don't seem quite right since I want the response from #1 to dictate how many flowfiles get created based on the array of URLs returned in #1. The response of calling each URL from the response of #1 will be the content of each flowfile that gets generated. UpdateAttribute: Set the blockId property based on the filename using expression language. Things get complex when trying to use #1 as the basis for the input flowfiles. I'm new to NiFi so any help with which processors to use and how the flow should be setup is much appreciated.
... View more
Labels:
- Labels:
-
Apache HBase
-
Apache NiFi