Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

NiFi - Load Distribution in GetFile Processor


Re: NiFi - Load Distribution in GetFile Processor

Master Guru

@Manish Gupta

Concurrent task configuration is per node. so setting concurrent tasks to 2 on a GetFile in a 4 node cluster can yield 8 active threads (2 per node). Processors are designed to avoid the race condition you describe here. In the case of the getFile processor, once a file listing is generated, each subsequent thread would assigned a batch from that list to ingest. No risk of a race condition causing multiple threads on the same processor form picking up the same files from that listing. But when you have two separate processors 2 GetFile processors listing and pulling from the same directory you can have the race condition since

In your other example (listenTCP (1 concurrent task) --> convertJSONtoAvro (2 concurrent tasks) --> putFile(1 concurrent task). When a single file is ingested by the listenTCP it is placed on the success connection feeding the convertJSON. Based on the run schedule of convertJSON it will kick off grab a thread an pull a file form that incoming connection. The second concurrent task would not get used. lets say a second file lands on that incoming connection. When the convertJSON processor is due to run it will be able to use that second concurrent task to work on that new file. As each process completes the threads is released. If a third file arrives and the first two have still not finished processing then the convertJSON simply cannot run again until one of the first two completes.

The NiFi processors will display in the upper right corner of the processor a number that represents the total number of active threads for that processor across your cluster. Keep in mind that most processing happens so fast you may not be able to catch it when it is displayed. Other times you may need a number but since the browser only updates every 30 seconds by default it may not still be active. The key here is if you see a queue building and a thread count consistently displayed in the upper right, that processor is potentially cpu intensive/bound or it has hung threads.

Hope this clarifies how NiFi handles multi-threading in it processors.




Re: NiFi - Load Distribution in GetFile Processor


This is some really good stuff !!! Thank You Matt.

Don't have an account?
Coming from Hortonworks? Activate your account here