Member since
07-18-2017
16
Posts
0
Kudos Received
0
Solutions
07-17-2018
10:56 PM
Thanks @ashok.kumar . I accepted the @onScheduled answer
... View more
07-13-2018
06:40 PM
Thanks @ashok.kumar that really clarifies it. Just to make sure I understand. In your example, the function will be called once for the entire 1000 file or it will be called 10 times (i.e., once for each run) ? Is there any difference if my processor's run schedule is 0 seconds ? I understood from your answer that @OnScheduled runs once for the 1000 flow files but if thats the case then when does the @@onUnScheduled gets called. Thanks again ashok. This discussion is really helpful for me
... View more
07-13-2018
03:40 PM
Thanks @ashok.kumar I just want to make sure I understand the behavior of onScheduled. Lets say that I have 200 flow files that come in 2 bursts (i.e., 100 flow files per burst) and there is only on thread available to Nifi. Lets assume started the custom processor once before the arrival of the first burst so I assume the model will not be loaded yet. Now when the first burst comes, the onScheduled method will be called and the model is now loaded once for the entire execution of the 100 flowfiles. But then lets assume another processor needs to execute before the second burst comes. At this point I believe that the memory used by the model will be deallocated and then reallocated again when the second burst comes. Is that the correct behavior of onScheduled method ?
... View more
07-13-2018
01:20 PM
Thanks @ashok.kumar The problem is that one of the properties of my custom processor is "Model directory" .. so in the init method I will have no access to this property because it is still being initialized. I am considering loading the model in the @onScheduled method but I do not know if @onScheduled is executed once perflow file or once per multiple flow files
... View more
07-10-2018
05:49 PM
Hi, I am building a Nifi custom processor that does prediction from a machine learning model for every flow file. I need to load the model once in memory when I start the processor and I use it in the onTrigger function for every flow file. Is there an @onStart function that I can override?
... View more
Labels:
- Labels:
-
Apache Oozie
01-18-2018
09:07 PM
@Matt Burgess Any insights about how you expect the behavior with "Maximum Timer Driven Thread Count = 1" and two processor with (Max Concurrrent Tasks = 1) ? Any pointers will be greatly appreciated
... View more
01-11-2018
08:57 PM
Thanks @Matt Burgess for
your response. I observe the behavior that you mentioned when the
"Maximum Timer Driven Thread Count" is more than 1. But I explicitly go
to the Controller settings in the global menu to set the "Maximum
Timer Driven Thread Count" to be 1. In this case, I was expecting that
each processor will request one thread (Max Concurrrent Tasks = 1) but
since there is only one thread in the Nifi instance then only one
processor can be using the thread at given time which results in
something similar to sequential execution. Any insights about the
expected behavior when "Maximum Timer Driven Thread Count = 1" ? My
main goal is to predict the execution time of Nifi when there is
limited number of threads compared to the number of concurrent tasks in order to know how can I provision my Nifi cluster and place my processors. Currently I see the same behavior when the thread count = 5 and thread
count = 1 so I am not sure how Nifi enforces this parameter. Any insights is much appreciated ? Below is a snapshot of my thread count settings:
... View more
01-10-2018
06:56 PM
I have an interesting
issue with Nifi concurrency. I have a dataflow with two connected ExecuteStreamCommand
processors (each with concurrent tasks=1), but when I set the number of
threads of the whole instance to 1, the two processors still manage to
somehow run concurrently although I expect them to run sequentially. Each processor makes a call to python code as part of its execution.
The first processor takes on average 2.5 seconds per flowfile and the
second processors takes on average 4.5 seconds per flowfile.. I gave the
dataflow 100 flowlfiles and I was expecting all flowfiles will finish
in around 700 seconds (i.e., sequential execution) but they manage
to finish in 480 seconds which suggests that each processor is using a
separate thread and they do not wait on each other. Am I missing
something here ? Note: The command that I run in the ExecuteStreamCommand processor is a call to python code that does busy wait. For example, the following is a snippet for busy wait for 2.5 seconds: import time
current_time = time.time()
while (time.time() < current_time+sleep_time):
pass
... View more
Labels:
- Labels:
-
Apache NiFi
11-17-2017
04:40 PM
Hi @Matt Clarke.. I have a related question regarding concurrency .. I have an issue that I have a dataflow with two connected processors (each with concurrent tasks=1), but when I set the
number of threads of the whole instance to 1, the two processors still manage to somehow run concurrently although I expect them to run sequentially ..
The first processor takes on average 2.5 seconds per input and the
second processors takes on average 4.5 seconds.. I gave it 100 inputs
and I was expecting it to finish in around 700 seconds (i.e., sequential
execution) but it still manages to finish in 480 seconds which suggests
that each processor is using a separate thread and they do not wait on
each other. Am I missing something here ?
... View more
11-16-2017
05:32 PM
Thanks @Shu that was really helpful.. I am just wondering that when I set the number of threads of the whole instance to 1 and I have two processors connected to each other they still manage to somehow run concurrently .. The first processor takes on average 2.5 seconds per input and the second processors takes on average 4.5 seconds.. I gave it 100 inputs and I was expecting it to finish in around 700 seconds (i.e., sequential execution) but it still manages to finish in 480 seconds which suggests that each processor is using a separate thread and they do not wait on each other. Am I missing something here ?
... View more