Created 03-22-2016 09:16 AM
Created 03-28-2016 07:40 PM
Context: I just came to Hortonworks after 6 years at Pentaho, the controller of the Kettle project. Agree with everything already said, but will add:
- Kettle is a batch oriented extract-transform-load (ETL) tool, primarily used for loading data warehouses/marts. Competes with tools such as Informatica, Talend, Datastage. But Kettle is the only traditional ETL tool that runs inside Hadoop as native MR or Yarn.
- you can do mico-batches (e.g. run a transform every few seconds/minutes), but it's not really intended for streaming
- the only true streaming capability is to use the Java Message System (JMS) consumer and producer steps to connect to a JMS compliant data bus
- it has "big data" connectors in open source, for read and writing from and too HDFS, MongoDB, Cassandra, Splunk
- it is multi-threaded and performance is generally very good (e.g. one performance test processed 12K rows/second/core). It tends to scale-up linearly with added cores. It also scales-out linearly through J2EE appserver clustering.
- Kettle is a Java runtime engine, and can run natively inside Hadoop as a MapReduce or Yarn job
- it has pretty nice workflow capabilities that Pentaho touts as an alternative to using Oozie. But it also has an Oozie workflow job step.
I'm new to Nifi so can't really contrast the two right now, but hopefully this information is useful.
Created 03-22-2016 04:58 PM
Well, it's a super-loaded question, but I'll try to highlight the most important differences and give some food for thought:
Created 03-24-2016 01:54 PM
Kettle is primarily and ETL tool designed to load static data from one source into another. Nifi is certainly capable of similar kinds of task but it's main focus is dealing with really fast flows of real time events. Nifi can run as a really small single instance JVM suitable to act as a data collection agent for an endpoint as well as scale through clustering to handle very large volumes of data from lots of endpoints. Once a cluster is up and running, changes can be made dynamically, without a redeploy or even much of a disruption to the data flows. For example, an endpoint in the field is sending out events in a JSON format but the application back at the data center expects a JSON object that has more fields than before and is now listening on a different IP and Port in a different data center. Nifi can capture the event in the field and then transform and direct the event to the correct listener in the required format without coding, redeployment, or even much of a disruption to the data flow. The best part is the entire flow is tracked and every modification or action on the event is visible and searchable. This makes it easy to account for and trouble shoot any issues that occur in transit.
Created 03-28-2016 07:40 PM
Context: I just came to Hortonworks after 6 years at Pentaho, the controller of the Kettle project. Agree with everything already said, but will add:
- Kettle is a batch oriented extract-transform-load (ETL) tool, primarily used for loading data warehouses/marts. Competes with tools such as Informatica, Talend, Datastage. But Kettle is the only traditional ETL tool that runs inside Hadoop as native MR or Yarn.
- you can do mico-batches (e.g. run a transform every few seconds/minutes), but it's not really intended for streaming
- the only true streaming capability is to use the Java Message System (JMS) consumer and producer steps to connect to a JMS compliant data bus
- it has "big data" connectors in open source, for read and writing from and too HDFS, MongoDB, Cassandra, Splunk
- it is multi-threaded and performance is generally very good (e.g. one performance test processed 12K rows/second/core). It tends to scale-up linearly with added cores. It also scales-out linearly through J2EE appserver clustering.
- Kettle is a Java runtime engine, and can run natively inside Hadoop as a MapReduce or Yarn job
- it has pretty nice workflow capabilities that Pentaho touts as an alternative to using Oozie. But it also has an Oozie workflow job step.
I'm new to Nifi so can't really contrast the two right now, but hopefully this information is useful.
Created 09-02-2016 03:37 AM
So can NiFi be said to be uniquely placed in systems integration space, something very much similar to Microsoft biztalk