We are trying to deploy NIFI for the first time. So this maybe noob question. We wanted to deploy 2 node of NIFI to start with, and 1 zookeeper, 1 nifi registry. We would need 1 static public IP as NIFI will provide end point for external services.
Any recommendation how we should setup this in AWS? We opt to not using aws marketplace images.
As NIFI stores flowfile, provenance files and other in local node disk, will it be lost when the node is killed and new node respawned?
There are several aspects to this question:
1. Number of nodes: As seen in the documentation, production deployments should have at least three nodes (also zookeeper would need 3 nodes, if you start with really low volumes you could consider co-locating).
2. IP addresses: Typically all Nifi nodes need a public IP address (otherwise you can usually not open the firewall from your data sources, but perhaps there are more reasons as well)
3. Data: As long as the disk is not lost, you can attach it to a different Nifi node if one breaks down
4. Deployment: The recommended deployment is via Ambari or Cloudera Manager. If you already have one of these in the cloud you can simply add Nifi as a service. If not it may be good to reach out to your local Cloudera contact person.
Thanks for the feedback.
I read somewhere that NIFI could be made stateless, is there any possibility that we run NIFI without storing data locally, but store contents and provenance on the 'shared' disk for all nodes?
When you run nifi as a microservice then you can configure PVC [Persistent Volume Claims] using helm in AKS or Kubernetes which will ensure that evenif the nifi pods restarts it will always have the same volume mounted.
Under the persitence configuration the parameter persistence.enabled should be set to true see Helm Chart for Apache Nifi