Support Questions

Find answers, ask questions, and share your expertise

after weak of working nifi cluster become unstable

Contributor

nifi-app.zipAfter working one weak our NiFi cluster become very unstable. Nodes are disconnect and reconnect every 5 - 30 minutes, processors don't work fine too. Restarting all 3 nodes solve the issue.

Restarting NiFi weakly is not a good solution but we can work only with this approach.

Example of log file from one of the node in attachment.

5 REPLIES 5

@Ramil Akhmadeev

There are multiple reasons your cluster could become unstable. Without having more information about your flow and resources available on the nodes, I would only be able to guess what the issue might be.

What version of NiFi are you running?

Contributor

Our NiFi have 8 Gb of heap, NiFi version is 1.1.0.2.

@Ramil Akhmadeev

Are the systems NiFi is running on physical servers or VM?

How many CPU's per system?

How are the disks configured? Multiple partitions or a single partition?

Are the zookeeper servers embedded or are they on separate systems?

What is volume of data on the systems when you see the nodes disconnect?

There are errors in the log from the DetectDuplicate processor, have you tried to address that issue?

There are also a lot of socket timeout exceptions.

Would you be willing to share a template of your flow?

Contributor

Our NiFi is co-located with other Hadoop components. This is physical servers.

24 Cores per machine.templates.zip

Zookeper is separate but on this machines.

Errors on DetectDuplicate processor are symptoms of this issue. Socket Timeouts too.

We have 3 Process Group on Our NiFi Cluster, their templates are in attachment.

@Ramil Akhmadeev

I was able to load only two of the processing groups. One has a custom processor named JsonDateEdit.

The DistributedMapCacheClientService controller service needs a DistributedMapCacheServer controller service.

I would attempt to determine why there are so many socket connection issues and eliminate them. Maybe reduce the number of Max Total Connections on your DBCPConnectionPool controller service to see if it reduces the warnings for socket issues. This is most likely causing the stability issue.