Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

NiFi Filtering for Kafka Pipeliine Purposes

avatar

Hello. I'm trying to create a fileType filtering system under Nifi. My team plans on kafka ingesting various types of files and processing their metadata for various purposes.

For now, I'm simply to fully understand how route files based on their end type (e.g. .txt, .tif) from kafka to the appropriate custom extractors that we're developing. Given my unfamiliarity with NifFi processor coding, I'm not positive as to how to best accomplish this task. As a Java programmer, I'd imagine creating an abstract class, then extending it with concrete versions based on what file type we need. For example:

fileName.endsWith(".xml")

-> send to appropriate custome extractor, etc.

Based on my research thus far, I suspect that the onRouteAttribute processor will be core to this, but I'm not 100% sure, and we lack a NiFi guru..

A just need a point in the right direction on how to start designing this pipeline filter within NiFi. Responses are appreciated.

P.S. If any can point out templates that may help, it's further appreciated.

1 ACCEPTED SOLUTION

avatar
Super Mentor

@David Morris

The nifi expression language can be used to route your data based on file extensions as you have described.

When NiFi ingested data a NiFi FlowFile is created. That FlowFile is a combination of the original content and Metadata about that content. Upon ingest some metadata is created for every FlowFile. One of those attributes is named "filename" and contains the original filename of the ingested file.

The RouteOnAttribute can use the NiFi Expression Language to evaluate the Flowfile's "filename" attribute fro routing purposes:

7741-screen-shot-2016-09-16-at-34705-pm.png

In the RouteOnAttribute processor you would need to add new properties fro each file extension type you want to look for:

7742-screen-shot-2016-09-16-at-34834-pm.png

Each one of those newly added properties become new relationships for that processor that can then be routed to follow-on processors as seen in the example above.

Thanks,

Matt

View solution in original post

2 REPLIES 2

avatar
Super Mentor

@David Morris

The nifi expression language can be used to route your data based on file extensions as you have described.

When NiFi ingested data a NiFi FlowFile is created. That FlowFile is a combination of the original content and Metadata about that content. Upon ingest some metadata is created for every FlowFile. One of those attributes is named "filename" and contains the original filename of the ingested file.

The RouteOnAttribute can use the NiFi Expression Language to evaluate the Flowfile's "filename" attribute fro routing purposes:

7741-screen-shot-2016-09-16-at-34705-pm.png

In the RouteOnAttribute processor you would need to add new properties fro each file extension type you want to look for:

7742-screen-shot-2016-09-16-at-34834-pm.png

Each one of those newly added properties become new relationships for that processor that can then be routed to follow-on processors as seen in the example above.

Thanks,

Matt

avatar

@mclark

Wow. Thanks. This may be the direction we're looking for. Thank you. This will certainly help. I feel some additional kafka questions coming along however: particularly on the topic of linking ConsumeKafka with GetKafka and it's properties, but this is definitely a big leap in where we want to be.