About johncarteratfl

johncarteratfl · ‎10-03-2017

Yes Paras. It is clear now. Thanks. However, any inputs on the following is highly appreciated. Currently, I have Nifi running on an edge node that has 4 cores. Say I have 20 incoming flow files and I give concurrent tasks as 10 for ExecuteStreamCommand processor, does it mean I get only concurrent execution or both concurrent and parallel execution?

johncarteratfl · ‎10-02-2017

Thanks for your reply Paras. Currently I designed the flow as SelectHiveQL(reading as csv instead of default avro)->SplitText(By Line) ->ExtractText (Here assigning content of the split files to an attribute). This is good so far. Every value of my query result is associated with a flow file attribute. And, I hope this is what you were also mentioning; but in a different way. Now the question is about the ExecuteStreamProcessor where I pass the flow file attribute to command arguments. So, here could you please clarify if one task handles one spark-submit command with attribute from one flowfile taken at a time? Is my understanding correct? I remember reading somewhere that one task in Nifi can process multiple flow files at a time. So, wanted to understand how the flowfiles are handled by Nifi processor tasks. Regards, John

johncarteratfl · ‎10-02-2017

Hi, I have a scenario and I would like to know your suggestions on how I can achieve it in Nifi. Step1: I got to query a hive table and get the list of values from a particular column Step2: I have a spark job that should be executed in a way that it takes these column values as one of the parameters to spark submit job. And, these spark jobs have to be executed in parallel. So, today if the query result gives me two values for the column that I queried, flow should trigger two spark submit jobs that run in parallel. And tomorrow if the result gives me 10 values for the queried column, 10 jobs should start in parallel. Ofcourse, I understand that when the resources are not available, it cannot start all the jobs. Please advise. On a different note, I would like to know how does a processor typically deal with incoming flow files. Does it process one flow file after the other or does it take a set of flow files and execute all of them in parallel? Thanks, John

johncarteratfl · ‎10-02-2017

Hi Wyner, Yes, after i disabled and enabled the MapCahce service configuration, it was working fine. Thanks for that.

johncarteratfl · ‎09-26-2017

I am using file name as my signal identifier. Filenames before notify and wait are all renamed to a specific value and the count of these signals is what I am checking.

johncarteratfl · ‎09-26-2017

Thanks for your reply, Wyner. Here is the screen shot of the flow that I simulated from the actual ones that we have. Also, the configurations of wait and notify that I am using are shown here: 1. Generate flow file generates a flow file every 5 seconds 2. Flow files are renamed to 'release_signal' (as this is what I am using as release signal identifier in wait and notify processors) 3. When there are 5 such signals, I want wait processor to push all the 5 flow files to the downstream success relation. Schedule on wait processor is 10 sec Given the above scenario, I expect the success relation from wait processor should get flow files in the steps 5.. (like 5,10,15 ..) But I do not see something like that. You may see it in the screen shot too. There are 13 files in total that are pushed to success relation from wait. What I observed is, sometimes 5 flow files are pushed at once. But sometimes even one flow file is pushed to success from wait which I don't understand why that happens. Since I could not get this working as expected, I switched to using "Merge Content" and it works for the use case that I have at hand. Flow: Update Attribute: Notify: Wait:

johncarteratfl · ‎09-18-2017

Hi Wyner, wait-processor.jpeg You can see that am waiting for 10 signals here.

johncarteratfl · ‎09-18-2017

Hi, Could you please let me know how the resetting of target signal count in wait processor work? Refer my question: https://community.hortonworks.com/questions/138762/reset-of-target-signal-count-in-wait-processor.html Thanks, John

johncarteratfl · ‎09-18-2017

Hi, Could you please let me know how the resetting of target signal count in wait processor work? Refer my question: https://community.hortonworks.com/questions/138762/reset-of-target-signal-count-in-wait-processor.html Thanks, John

johncarteratfl · ‎09-15-2017

I have a flow in which I wait for 5 files to come out from each of the respective executeProcessors, after which I start another processor. I achieved this using Wait and Notify processors by target signal count given as 5. I stop and restart the flow, it works as expected. Now, my question is when and how would this counter be reset? If the executeProcessors are scheduled to run for every 30 min, do signals from notify keep increasing the counter. If so, how does my wait processor ever match the signal count of 5 that I gave and proceed to next processor every time. Thanks, John

Online	Offline
Last Visited	‎11-01-2018 07:37 PM

Member Since	‎09-13-2017 07:16 PM
Last Visited	‎11-01-2018 07:37 PM
Posts	20

Cloudera Community

Re: Get multiple spark jobs started in parallel us...

Re: Get multiple spark jobs started in parallel us...

Get multiple spark jobs started in parallel using ...

Re: Reset of Target Signal Count in Wait Processor

Re: Reset of Target Signal Count in Wait Processor

Re: Reset of Target Signal Count in Wait Processor

Re: Reset of Target Signal Count in Wait Processor

Re: Apache Nifi - What are Counters in Nifi?

Re: how to delete Nifi counters ?

Reset of Target Signal Count in Wait Processor