Member since
01-14-2019
2
Posts
0
Kudos Received
0
Solutions
01-14-2019
08:28 AM
I know Apache NiFi comes prepackaged with AvroSchemaRegistry but wondering if I am painting myself in a corner by specifying the schema definition directly in the input processors? Here are some points I would appreciate an opinion on: I don't really care about schema evolution and would likely just end up changing the schema definition if/when it changes. I believe this points me more toward just using AvroSchemaRegistry and embedding the schema directly in the input processor. Is it beneficial to use Hortonworks Schema Registry because it supports more schema types than just Avro? Some vendors may provide a schema in a format other than Avro, in which case I would have to convert it to the Avro definition. But using Hortonworks Schema Registry would allow for processing of the default schema they provide assuming it is in a format supported by the schema registry (i.e. JSON, CSV, etc.) I'm new to converting files using schema mappings. How do you handle renaming a field? Assuming schema A maps fields for File A and schema B maps fields for File B, I know that a field F must match in both schemas but what about when you want to rename that field to Fn before it is output to File B? I'm just trying to figure out which schema registry I should use (built-in vs. external) and while I do see some cool features with the external option, I'm still thinking it might be overkill. Some additional use cases would be appreciated.
... View more
Labels:
01-14-2019
08:28 AM
I have a URL I hit that returns a JSON payload like this: [
"\/en\/download-data\/546457547?token=ABCDEFGHIJKL123456",
"\/en\/download-data\/34543534?token=ABCDEFGHIJKL123456",
"\/en\/download-data\/1423422?token=ABCDEFGHIJKL123456",
"\/en\/download-data\/97534444?token=ABCDEFGHIJKL123456"
] Each of the URLs in the response is itself a text file payload. For each file: I want to download each record in the JSON array response into its own Flowfile for processing (so I'll need to prepend the URL I just hit to get this response since it is a relative path). Each resulting Flowfile that is downloaded should be named based on the filename in the Content-Disposition header. Each flowfile should have an attribute added that takes a substring of the file name (as resolved from the 2nd requirement) and add it as an attribute named blockId. For example: a filename of bazaz.txt that was downloaded would have a blockId:bazaz in its attributes. So far I have this processor flow: GetHttp: Download the metadata URL that points to the files. SplitRecord or PartitionRecord?: Break up the response from #1 into different FlowFiles. These processors don't seem quite right since I want the response from #1 to dictate how many flowfiles get created based on the array of URLs returned in #1. The response of calling each URL from the response of #1 will be the content of each flowfile that gets generated. UpdateAttribute: Set the blockId property based on the filename using expression language. Things get complex when trying to use #1 as the basis for the input flowfiles. I'm new to NiFi so any help with which processors to use and how the flow should be setup is much appreciated.
... View more
Labels: