Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Processor is not working if the connection queue before the processor is full

Processor is not working if the connection queue before the processor is full

New Contributor

I have this problem that the processor whose incoming connection queue is full (back pressure is applied) is not working at all like in the picture below.


Once I delete the queue it is working fine. But this is problematic once it gets into operational phase.

Can someone tell me why this is happening and any ways to fix it?

Also, are there any ways to improve IO of PutHDFS except for assigning more concurrent task to it?


109926-1563525567222.png


Thank you.

3 REPLIES 3

Re: Processor is not working if the connection queue before the processor is full

Super Guru

@Micro

Change PutHDFS to run on All Nodes, Right now the processor is running only on Primary Node.

If you have all the queued flowfiles(1,328) in other nodes except Primary node then PutHDFS processor doesn't process those files Until the primary node changes.


In case you have more number of nodes in NiFi cluster running on all nodes will give you best resutl (but need to distribute the load across the cluster by using Load balancing (or) RemoteProcessorGroup).

Increasing concurrent tasks is one of the optimizations we can perform on PutHDFS processor if you are not distributing the load across the cluster only if one node is doing all the work.

Re: Processor is not working if the connection queue before the processor is full

New Contributor

@Shu Thank you and you are right. Now it can process and faster than before.

Just wondering about combination of load balancing and concurrent task. If we assign concurrent tasks to some CPU-intensive processor like ExecuteScript and ExecuteStreamCommand, which run on all nodes, will load balancing the queued up data before the processor give better result than simply running it with concurrent task? Because I thought it takes some time to distribute relatively large data (~5 MB) across the cluster.

And why not PutHDFS with some concurrent tasks and load balance together working?

Re: Processor is not working if the connection queue before the processor is full

Community Manager

The above question and the entire response thread below were originally posted in the Community Help track. On Sat Jul 20 16:18 UTC 2019, a member of the HCC moderation staff moved it to the Data Ingestion & Streaming track. The Community Help Track is intended for questions about using the HCC site itself.

Bill Brooks, Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Don't have an account?
Coming from Hortonworks? Activate your account here