Support Questions

quentin_toulou · ‎07-28-2017

With the new version of HDF (3.0), we find Nifi and the new application : Streaming analystics manager. What is the main difference between Nifi and the stream builder module of SAM ? How can/should we use these two applications together ?

After a quick time to read the description of this module, it seems it's really close to Nifi.

Thank you !

gkeys · ‎07-28-2017

Both are similar in their awesome drag-and-drop UI to process data in motion, However, they differ fundamentally in purpose and underlying technology.

Differences

Purpose

NiFi is meant for data flow management while Streaming Analytics Manager (SAM) is meant for advanced (complex) real-time analytics. In general, for NiFi think acquiring, transforming and routing data to target destinations and for SAM think complex analytics on data as it is flowing across the wire.

Here is a more detailed comparison between flow management (NiFI) and stream analytics (SAM)

	Flow Management (NiFi)	Stream Analytics (SAM)
data velocity	batch, microbatch or streaming (from diverse sources)	streaming (from diverse sources)
data size (per content)	small (kb) to large (GB)	small (KB, MB) per message in stream
data manipulation	rich: parse, filter, join, transform, enrich, reformat	minimal changes to data
data flow management	powerful: queue prioritization, back pressure, route/merge, persist to target	minimal: mostly route/merge and persist to target
real-time analytics	basic	powerful

So NiFi is great to manage the movement of data from diverse sources (from small sensors, ftp locations, relational databases, rest apis in the cloud, and so on) to similar targets while modifying and making decisions on the data in between. SAM is great at watching real-time streams of data and doing advanced analytics (dashboarding/visualizations, alerting, predictions, etc) as it flows by.

Technology

NiFi is built around processors and connections with repositories underneath. SAM is built on top of Storm and Kafka (and Druid).

Shared

What do they have in common? Both have easy UI development that hides complexity underneath. Both are components of Hortonworks Data Flow (HDF) distribution. Both share Kafka (see below). Both are managed by the Ambari (admin and monitoring) and Ranger (authorization and security). Both can use the same Schema Registry to work with data structure of content.

Do they connect?

A very common pattern is this: stream data using NiFi (and possibly filter, transform, enrich) and pass it to a Kafka queue to make it durable (persistent until consumed). SAM pulls from the queue (subscribes to a topic) and does advanced analytics from there (dashboarding/visualizations, alerting, predictions, etc). SAM pushes to hadoop (HBase or Hive) to persist for further historical analysis and exploration (data science, business intelligence, etc) Tutorial mentioned by @Wynner is an excellent example of this pattern and the separate strengths of NiFi and SAM.

View solution in original post

Wynner · ‎07-28-2017

@Quentin T

Here is a link to a tutorial that will help show how these tools work together and what role they play.

REAL-TIME EVENT PROCESSING IN NIFI, SAM, SCHEMA REGISTRY AND SUPERSET

gkeys · ‎07-28-2017

Both are similar in their awesome drag-and-drop UI to process data in motion, However, they differ fundamentally in purpose and underlying technology.

Differences

Purpose

NiFi is meant for data flow management while Streaming Analytics Manager (SAM) is meant for advanced (complex) real-time analytics. In general, for NiFi think acquiring, transforming and routing data to target destinations and for SAM think complex analytics on data as it is flowing across the wire.

Here is a more detailed comparison between flow management (NiFI) and stream analytics (SAM)

	Flow Management (NiFi)	Stream Analytics (SAM)
data velocity	batch, microbatch or streaming (from diverse sources)	streaming (from diverse sources)
data size (per content)	small (kb) to large (GB)	small (KB, MB) per message in stream
data manipulation	rich: parse, filter, join, transform, enrich, reformat	minimal changes to data
data flow management	powerful: queue prioritization, back pressure, route/merge, persist to target	minimal: mostly route/merge and persist to target
real-time analytics	basic	powerful

So NiFi is great to manage the movement of data from diverse sources (from small sensors, ftp locations, relational databases, rest apis in the cloud, and so on) to similar targets while modifying and making decisions on the data in between. SAM is great at watching real-time streams of data and doing advanced analytics (dashboarding/visualizations, alerting, predictions, etc) as it flows by.

Technology

NiFi is built around processors and connections with repositories underneath. SAM is built on top of Storm and Kafka (and Druid).

Shared

What do they have in common? Both have easy UI development that hides complexity underneath. Both are components of Hortonworks Data Flow (HDF) distribution. Both share Kafka (see below). Both are managed by the Ambari (admin and monitoring) and Ranger (authorization and security). Both can use the same Schema Registry to work with data structure of content.

Do they connect?

A very common pattern is this: stream data using NiFi (and possibly filter, transform, enrich) and pass it to a Kafka queue to make it durable (persistent until consumed). SAM pulls from the queue (subscribes to a topic) and does advanced analytics from there (dashboarding/visualizations, alerting, predictions, etc). SAM pushes to hadoop (HBase or Hive) to persist for further historical analysis and exploration (data science, business intelligence, etc) Tutorial mentioned by @Wynner is an excellent example of this pattern and the separate strengths of NiFi and SAM.

quentin_toulou · ‎07-31-2017

Thanks a lot for this detailed response ! @Greg Keys

Cloudera Community

Support Questions

[HDF-3.0] Difference between Nifi and Stream builder module of Streaming analytics manager

Differences

Shared

Do they connect?

Differences

Shared

Do they connect?