Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What is the best starting point for topical streaming with NIFI

Highlighted

What is the best starting point for topical streaming with NIFI

New Contributor

I'm new to NIFI .. yet I've come along in terms of understanding how to create flows that pick up data from various sources, and drive the transformed data to a database.

I need to add topical event publication to my nifi workflows. In terms of requirements, JS clients will subscribe to the named topics, and expect to receive topical messages asynchronously.

I may also have to set up intermediate processors that subscribe to this topical data, and perform some business logic against it.

I've been looking at using either Kafka or Spark or Storm in this context. I've also considered Web Sockets, or better yet socket.io. I've done a bit of reading, on the advantages and disadvantages of each. At this point I just want to select an architectural starting point in order to enhance my learning.

Do any of the experts have a recommendation in terms of the starting point? I have a set of NIFI flows, some of which need to publish to a proper event stream. Where should I start?

2 REPLIES 2

Re: What is the best starting point for topical streaming with NIFI

New Contributor

The more I read about the javascript piece of my system, the more I think I need to support publishing streams in a fashion that is compatible with socket.io. Perhaps my starting point should be a NIFI processor that allows me to publish to a socket.io stream..

Re: What is the best starting point for topical streaming with NIFI

Cloudera Employee

@David Sargrad

Could you not use Nifi for the Ingestion from source to a kafka topic. Then use SparkStreaming to read from Kafka perform business logic and write to another Kafka topic. Nifi can then collect and route to end point.

SAM/Storm is an option if their is a true real-time requirement?

NiFi->Kafka->SparkStreaming -> NiFi has the advantages of (if required) single security and governance in Ranger/Atlas along with lineage.