@Matt.Clarke I read an interesting reply to a question dated 3rd March https://community.hortonworks.com/questions/86732/failover-mechanism-in-nifi.html . You mentioned Data HA across NiFi nodes is a future roadmap item. Has this been implemented? If not is there a release version that it will be implemented in? Also what is the process for retrieving lost data from a Nifi node that can be restarted?
Hi @Ben Morris
This feature is still on the roadmap and it's not available yet: https://cwiki.apache.org/confluence/display/NIFI/High+Availability+Processing
What are you trying to achieve? does RAID disk an acceptable solution for you?
The data is time critical and must be as near to real time as possible. If a node goes down and there is a delay in that nodes queued data getting to its destination this would not be acceptable as alerts could potentially be delayed. Could you please describe the process for migrating queued data of a failed node to a new node? My thoughts are this could potentially be automated and might fit into an acceptable time frame for data to get to its destination.
Also.... Is there estimation for when High Availability Processing will be available or is there a work around that could be put in place?
Hi @Ben Morris
I understand the requirement, I have the same needs for few use cases. Unfortunately, there's no ETA for this feature yet. This is something the community is aware of. Getting this done depends on priorities as well as the complexity of this feature.
Regarding migration, data queued in a node can be used again if the node is brought back again. If this is not possible, you can spin up a new node and configure it to use the existing repos from the old node (they are not specific to a NiFi node). IMO this migration process will depend on your infrastructure. If you are on baremetal node with RAID local storage, this will take time as you need to bring back a new physical node with the old disks (if node recovery is not possible). If you are on virtual infrastructure, the task will be easier since you can create new VM, install NiFi and make it use the existing repos. Here also, time and complexity will depend on your storage type (local or network).
Working on HA/fault tolerance with realtime is not an easy task. You have lot of things to consider around data duplication. I am thinking out loud here but If you can afford at-least-once strategy, you can may be design your flow to achieve it (using state backend). There's no easy standard solution though. This will depend you data source, your ability to deduplicate data and so on. This is something I am working currently.
In regards to flow design do you mean if a node go's down replay everything that has not been acknowledged as processed at your data source (jms, kafka) and then remove duplicates before pushing to the data's storage destination (kafka, HDFS). I think the Notify, Wait and DetectDuplicate processors would be usfeul in this case.
I see that the feature is still pending on the wiki, does it means that even today this HA feature is not yet available ?
@Ben can you please share the solution proposed by Abdelkrim with notify, wait , detectDuplicate ? How do you store the latest offset from kafka you think you have read ? how the notify works in this case ?
I am not able to implement cluster coordinator concept for NIFI UI HA.
any link for that?
And is this(NIFI HA) still not implemented by Hortonworks?