I'm new to NIFI .. yet I've come along in terms of understanding how to create flows that pick up data from various sources, and drive the transformed data to a database.
I need to add topical event publication to my nifi workflows. In terms of requirements, JS clients will subscribe to the named topics, and expect to receive topical messages asynchronously.
I may also have to set up intermediate processors that subscribe to this topical data, and perform some business logic against it.
I've been looking at using either Kafka or Spark or Storm in this context. I've also considered Web Sockets, or better yet socket.io. I've done a bit of reading, on the advantages and disadvantages of each. At this point I just want to select an architectural starting point in order to enhance my learning.
Do any of the experts have a recommendation in terms of the starting point? I have a set of NIFI flows, some of which need to publish to a proper event stream. Where should I start?
Could you not use Nifi for the Ingestion from source to a kafka topic. Then use SparkStreaming to read from Kafka perform business logic and write to another Kafka topic. Nifi can then collect and route to end point.
SAM/Storm is an option if their is a true real-time requirement?
NiFi->Kafka->SparkStreaming -> NiFi has the advantages of (if required) single security and governance in Ranger/Atlas along with lineage.