Created on 09-26-2022 01:31 AM - edited 09-26-2022 01:52 AM
Consider the following example from Knime:
in which the joiner processor knows what columns it receives from its incoming relationships (e.g., country). If one wants to implement such functionality for NiFi (e.g., in a custom Processor UI), can he obtain information about a processor's preceding processors in the pipeline (e.g., the schema of the output of the processors)? Note that we need the information in the phase of defining the pipeline not when the pipeline is executed.
Created 09-27-2022 10:53 AM
@morti I believe the answer is yes. For example, if i query a database, and get results as content, that schema for those results is an attribute of the flowfile holding the content. Additionally, i could add attributes in my flow based on certain conditions, or other attributes, or even the content from prev processors.
Created 10-10-2022 10:36 PM
@morti To add to this, this is essentially (if not just very similar) to nifi's Record API where Record processors require a RecordReader/Writer controller service where the schema for the incoming/outgoing files is defined. All these processors can get their schema from some registry, or have it configured hard-coded, or try to infer the schema, or simply rely on the schema that was given to the flowfile earlier in the flow.
I think it's worth looking into Records in nifi, they're designed specifically for well-defined data and use-cases similar to what your described