Support Questions

alvinuw · ‎08-29-2016

Hi guys,

I am confused in a couple of questions on NCM failure.

My understanding is the main responsibilities of NCM are:

•Communicates dataflow changes to the nodes

•Receives health and status information from nodes

If NCM fails, the existing dataflow can still run, but can't be changed. And new node can't join in, dead node can't be detected.

How about we have below two scenarios:

1. A ListSftp processor deployed in Primary node, it will distribute filepaths to each work nodes running FetchSftp.

However, if NCM fails, and one of the work nodes also fails, will the primary node still send data to the dead work node?

How does primary node know that node is dead?

2. If we setup "DistributedMapCacheServer" in NCM, later NCM fails, does that mean the work node can't access to the Cache Server any longer? Any solution to make it high available?

Thanks.

bbende · ‎08-29-2016

1) The Remote Process Group checks periodically (I think once per minute) with the NCM to get the status of the nodes in the cluster. If NCM is down and then one of the other nodes fails, the primary node will try to send data to that failed node, but it will get some kind of exception and then it will move on and try another node. So primary node doesn't know the other node is dead, but keeps trying to nodes until on succeeds.

2) Yes if you run the cache server on the NCM and NCM fails, then the other nodes can't access the cache server. The long term solution is to use a true distributed cache (memcached, redis, etc) as the backing implementation that the cache client talks to, this just hasn't been implemented yet.

View solution in original post

bbende · ‎08-29-2016

1) The Remote Process Group checks periodically (I think once per minute) with the NCM to get the status of the nodes in the cluster. If NCM is down and then one of the other nodes fails, the primary node will try to send data to that failed node, but it will get some kind of exception and then it will move on and try another node. So primary node doesn't know the other node is dead, but keeps trying to nodes until on succeeds.

2) Yes if you run the cache server on the NCM and NCM fails, then the other nodes can't access the cache server. The long term solution is to use a true distributed cache (memcached, redis, etc) as the backing implementation that the cache client talks to, this just hasn't been implemented yet.

alvinuw · ‎08-29-2016

Hi @Bryan Bende

Thanks for your response.

Since NiFi 1.0 will be master-zero design, will the MapCacheServer be redesign based on the new architecture?

or it is still only available on the elected leader node?

Thanks.

bbende · ‎08-29-2016

In 1.0.0 the difference will be that the cache server has to run on all nodes because there is no more concept of choosing where the controller service runs (since there is no master). The cache client would be configured to point to the cache server on one of the nodes, so if that node goes down there is still no automatic failover at this point.

Cloudera Community

Support Questions

If NiFi NCM fails, how about Loadbalancing, MapCacheServer