Created 09-16-2016 05:44 PM
Hello. I'm trying to create a fileType filtering system under Nifi. My team plans on kafka ingesting various types of files and processing their metadata for various purposes.
For now, I'm simply to fully understand how route files based on their end type (e.g. .txt, .tif) from kafka to the appropriate custom extractors that we're developing. Given my unfamiliarity with NifFi processor coding, I'm not positive as to how to best accomplish this task. As a Java programmer, I'd imagine creating an abstract class, then extending it with concrete versions based on what file type we need. For example:
fileName.endsWith(".xml")
-> send to appropriate custome extractor, etc.
Based on my research thus far, I suspect that the onRouteAttribute processor will be core to this, but I'm not 100% sure, and we lack a NiFi guru..
A just need a point in the right direction on how to start designing this pipeline filter within NiFi. Responses are appreciated.
P.S. If any can point out templates that may help, it's further appreciated.
Created on 09-16-2016 07:53 PM - edited 08-18-2019 05:20 AM
The nifi expression language can be used to route your data based on file extensions as you have described.
When NiFi ingested data a NiFi FlowFile is created. That FlowFile is a combination of the original content and Metadata about that content. Upon ingest some metadata is created for every FlowFile. One of those attributes is named "filename" and contains the original filename of the ingested file.
The RouteOnAttribute can use the NiFi Expression Language to evaluate the Flowfile's "filename" attribute fro routing purposes:
In the RouteOnAttribute processor you would need to add new properties fro each file extension type you want to look for:
Each one of those newly added properties become new relationships for that processor that can then be routed to follow-on processors as seen in the example above.
Thanks,
Matt
Created on 09-16-2016 07:53 PM - edited 08-18-2019 05:20 AM
The nifi expression language can be used to route your data based on file extensions as you have described.
When NiFi ingested data a NiFi FlowFile is created. That FlowFile is a combination of the original content and Metadata about that content. Upon ingest some metadata is created for every FlowFile. One of those attributes is named "filename" and contains the original filename of the ingested file.
The RouteOnAttribute can use the NiFi Expression Language to evaluate the Flowfile's "filename" attribute fro routing purposes:
In the RouteOnAttribute processor you would need to add new properties fro each file extension type you want to look for:
Each one of those newly added properties become new relationships for that processor that can then be routed to follow-on processors as seen in the example above.
Thanks,
Matt
Created 09-19-2016 08:28 PM
Wow. Thanks. This may be the direction we're looking for. Thank you. This will certainly help. I feel some additional kafka questions coming along however: particularly on the topic of linking ConsumeKafka with GetKafka and it's properties, but this is definitely a big leap in where we want to be.