About Threepwood

Threepwood · ‎05-17-2022

Many Thanks. I want to confirm one more thing, it seems each content-accessing processor needs to read content from disk, even when two of such processors are directly connected to each other. E.g. I have a ConsumeKafkaRecord_2_0 leads to a PutElasticsearchHttpRecord, the former writes to disk, while the latter reads from disk. However, if the content can be cached in memory (and meanwhile synced in disk), there would be one disk IO saved, so is there any configure properties to make content cached in memory? If there were such option, It should improve the overall throughput, otherwise, it seems better to merge all content-accessing processor into one single processor to save disk IO, correct?

Threepwood · ‎05-08-2022

In Apache Nifi, there are connections between each processors, which acts like queue of FlowFiles, and Nifi by default persists data content of FlowFile on disk. Does it mean each of such connection persists FlowFiles on disk? If that were true, each time of delivery of FlowFiles from one processor to another would mean one disk read and write, thus more processors would lead to more disk reads and writes, which in turn would lower the entire throughput. Is my understanding correct? and what is the best practice to avoid it, writing all things in one processor? Thanks.

Online	Offline
Last Visited	‎05-24-2022 08:20 AM

Member Since	‎05-08-2022 08:35 PM
Last Visited	‎05-24-2022 08:20 AM
Posts	2

Cloudera Community

Re: Does more processors in Apache Nifi lead to lo...

Does more processors in Apache Nifi lead to lower ...