Created 07-28-2017 01:15 PM
With the new version of HDF (3.0), we find Nifi and the new application : Streaming analystics manager. What is the main difference between Nifi and the stream builder module of SAM ? How can/should we use these two applications together ?
After a quick time to read the description of this module, it seems it's really close to Nifi.
Thank you !
Created 07-28-2017 08:15 PM
Both are similar in their awesome drag-and-drop UI to process data in motion, However, they differ fundamentally in purpose and underlying technology.
Purpose
NiFi is meant for data flow management while Streaming Analytics Manager (SAM) is meant for advanced (complex) real-time analytics. In general, for NiFi think acquiring, transforming and routing data to target destinations and for SAM think complex analytics on data as it is flowing across the wire.
Here is a more detailed comparison between flow management (NiFI) and stream analytics (SAM)
Flow Management (NiFi) | Stream Analytics (SAM) | |
data velocity | batch, microbatch or streaming (from diverse sources) | streaming (from diverse sources) |
data size (per content) | small (kb) to large (GB) | small (KB, MB) per message in stream |
data manipulation | rich: parse, filter, join, transform, enrich, reformat | minimal changes to data |
data flow management | powerful: queue prioritization, back pressure, route/merge, persist to target | minimal: mostly route/merge and persist to target |
real-time analytics | basic | powerful |
So NiFi is great to manage the movement of data from diverse sources (from small sensors, ftp locations, relational databases, rest apis in the cloud, and so on) to similar targets while modifying and making decisions on the data in between. SAM is great at watching real-time streams of data and doing advanced analytics (dashboarding/visualizations, alerting, predictions, etc) as it flows by.
Technology
NiFi is built around processors and connections with repositories underneath. SAM is built on top of Storm and Kafka (and Druid).
What do they have in common? Both have easy UI development that hides complexity underneath. Both are components of Hortonworks Data Flow (HDF) distribution. Both share Kafka (see below). Both are managed by the Ambari (admin and monitoring) and Ranger (authorization and security). Both can use the same Schema Registry to work with data structure of content.
A very common pattern is this: stream data using NiFi (and possibly filter, transform, enrich) and pass it to a Kafka queue to make it durable (persistent until consumed). SAM pulls from the queue (subscribes to a topic) and does advanced analytics from there (dashboarding/visualizations, alerting, predictions, etc). SAM pushes to hadoop (HBase or Hive) to persist for further historical analysis and exploration (data science, business intelligence, etc) Tutorial mentioned by @Wynner is an excellent example of this pattern and the separate strengths of NiFi and SAM.
Created 07-28-2017 06:50 PM
Here is a link to a tutorial that will help show how these tools work together and what role they play.
REAL-TIME EVENT PROCESSING IN NIFI, SAM, SCHEMA REGISTRY AND SUPERSET
Created 07-28-2017 08:15 PM
Both are similar in their awesome drag-and-drop UI to process data in motion, However, they differ fundamentally in purpose and underlying technology.
Purpose
NiFi is meant for data flow management while Streaming Analytics Manager (SAM) is meant for advanced (complex) real-time analytics. In general, for NiFi think acquiring, transforming and routing data to target destinations and for SAM think complex analytics on data as it is flowing across the wire.
Here is a more detailed comparison between flow management (NiFI) and stream analytics (SAM)
Flow Management (NiFi) | Stream Analytics (SAM) | |
data velocity | batch, microbatch or streaming (from diverse sources) | streaming (from diverse sources) |
data size (per content) | small (kb) to large (GB) | small (KB, MB) per message in stream |
data manipulation | rich: parse, filter, join, transform, enrich, reformat | minimal changes to data |
data flow management | powerful: queue prioritization, back pressure, route/merge, persist to target | minimal: mostly route/merge and persist to target |
real-time analytics | basic | powerful |
So NiFi is great to manage the movement of data from diverse sources (from small sensors, ftp locations, relational databases, rest apis in the cloud, and so on) to similar targets while modifying and making decisions on the data in between. SAM is great at watching real-time streams of data and doing advanced analytics (dashboarding/visualizations, alerting, predictions, etc) as it flows by.
Technology
NiFi is built around processors and connections with repositories underneath. SAM is built on top of Storm and Kafka (and Druid).
What do they have in common? Both have easy UI development that hides complexity underneath. Both are components of Hortonworks Data Flow (HDF) distribution. Both share Kafka (see below). Both are managed by the Ambari (admin and monitoring) and Ranger (authorization and security). Both can use the same Schema Registry to work with data structure of content.
A very common pattern is this: stream data using NiFi (and possibly filter, transform, enrich) and pass it to a Kafka queue to make it durable (persistent until consumed). SAM pulls from the queue (subscribes to a topic) and does advanced analytics from there (dashboarding/visualizations, alerting, predictions, etc). SAM pushes to hadoop (HBase or Hive) to persist for further historical analysis and exploration (data science, business intelligence, etc) Tutorial mentioned by @Wynner is an excellent example of this pattern and the separate strengths of NiFi and SAM.
Created 07-31-2017 08:00 AM
Thanks a lot for this detailed response ! @Greg Keys