Support Questions

Find answers, ask questions, and share your expertise

[HDF-3.0] Difference between Nifi and Stream builder module of Streaming analytics manager

avatar
Contributor

With the new version of HDF (3.0), we find Nifi and the new application : Streaming analystics manager. What is the main difference between Nifi and the stream builder module of SAM ? How can/should we use these two applications together ?

After a quick time to read the description of this module, it seems it's really close to Nifi.

Thank you !

1 ACCEPTED SOLUTION

avatar
Guru

Both are similar in their awesome drag-and-drop UI to process data in motion, However, they differ fundamentally in purpose and underlying technology.

Differences

Purpose

NiFi is meant for data flow management while Streaming Analytics Manager (SAM) is meant for advanced (complex) real-time analytics. In general, for NiFi think acquiring, transforming and routing data to target destinations and for SAM think complex analytics on data as it is flowing across the wire.

Here is a more detailed comparison between flow management (NiFI) and stream analytics (SAM)

Flow Management (NiFi)Stream Analytics (SAM)
data velocitybatch, microbatch or streaming (from diverse sources)streaming (from diverse sources)
data size (per content)small (kb) to large (GB) small (KB, MB) per message in stream
data manipulationrich: parse, filter, join, transform, enrich, reformatminimal changes to data
data flow managementpowerful: queue prioritization, back pressure, route/merge, persist to targetminimal: mostly route/merge and persist to target
real-time analyticsbasicpowerful

So NiFi is great to manage the movement of data from diverse sources (from small sensors, ftp locations, relational databases, rest apis in the cloud, and so on) to similar targets while modifying and making decisions on the data in between. SAM is great at watching real-time streams of data and doing advanced analytics (dashboarding/visualizations, alerting, predictions, etc) as it flows by.

Technology

NiFi is built around processors and connections with repositories underneath. SAM is built on top of Storm and Kafka (and Druid).

Shared

What do they have in common? Both have easy UI development that hides complexity underneath. Both are components of Hortonworks Data Flow (HDF) distribution. Both share Kafka (see below). Both are managed by the Ambari (admin and monitoring) and Ranger (authorization and security). Both can use the same Schema Registry to work with data structure of content.

Do they connect?

A very common pattern is this: stream data using NiFi (and possibly filter, transform, enrich) and pass it to a Kafka queue to make it durable (persistent until consumed). SAM pulls from the queue (subscribes to a topic) and does advanced analytics from there (dashboarding/visualizations, alerting, predictions, etc). SAM pushes to hadoop (HBase or Hive) to persist for further historical analysis and exploration (data science, business intelligence, etc) Tutorial mentioned by @Wynner is an excellent example of this pattern and the separate strengths of NiFi and SAM.

View solution in original post

3 REPLIES 3

avatar
@Quentin T

Here is a link to a tutorial that will help show how these tools work together and what role they play.

REAL-TIME EVENT PROCESSING IN NIFI, SAM, SCHEMA REGISTRY AND SUPERSET

avatar
Guru

Both are similar in their awesome drag-and-drop UI to process data in motion, However, they differ fundamentally in purpose and underlying technology.

Differences

Purpose

NiFi is meant for data flow management while Streaming Analytics Manager (SAM) is meant for advanced (complex) real-time analytics. In general, for NiFi think acquiring, transforming and routing data to target destinations and for SAM think complex analytics on data as it is flowing across the wire.

Here is a more detailed comparison between flow management (NiFI) and stream analytics (SAM)

Flow Management (NiFi)Stream Analytics (SAM)
data velocitybatch, microbatch or streaming (from diverse sources)streaming (from diverse sources)
data size (per content)small (kb) to large (GB) small (KB, MB) per message in stream
data manipulationrich: parse, filter, join, transform, enrich, reformatminimal changes to data
data flow managementpowerful: queue prioritization, back pressure, route/merge, persist to targetminimal: mostly route/merge and persist to target
real-time analyticsbasicpowerful

So NiFi is great to manage the movement of data from diverse sources (from small sensors, ftp locations, relational databases, rest apis in the cloud, and so on) to similar targets while modifying and making decisions on the data in between. SAM is great at watching real-time streams of data and doing advanced analytics (dashboarding/visualizations, alerting, predictions, etc) as it flows by.

Technology

NiFi is built around processors and connections with repositories underneath. SAM is built on top of Storm and Kafka (and Druid).

Shared

What do they have in common? Both have easy UI development that hides complexity underneath. Both are components of Hortonworks Data Flow (HDF) distribution. Both share Kafka (see below). Both are managed by the Ambari (admin and monitoring) and Ranger (authorization and security). Both can use the same Schema Registry to work with data structure of content.

Do they connect?

A very common pattern is this: stream data using NiFi (and possibly filter, transform, enrich) and pass it to a Kafka queue to make it durable (persistent until consumed). SAM pulls from the queue (subscribes to a topic) and does advanced analytics from there (dashboarding/visualizations, alerting, predictions, etc). SAM pushes to hadoop (HBase or Hive) to persist for further historical analysis and exploration (data science, business intelligence, etc) Tutorial mentioned by @Wynner is an excellent example of this pattern and the separate strengths of NiFi and SAM.

avatar
Contributor

Thanks a lot for this detailed response ! @Greg Keys