Member since
03-28-2016
1
Post
5
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
8899 | 03-28-2016 07:40 PM |
03-28-2016
07:40 PM
5 Kudos
Context: I just came to Hortonworks after 6 years at Pentaho, the controller of the Kettle project. Agree with everything already said, but will add: - Kettle is a batch oriented extract-transform-load (ETL) tool, primarily used for loading data warehouses/marts. Competes with tools such as Informatica, Talend, Datastage. But Kettle is the only traditional ETL tool that runs inside Hadoop as native MR or Yarn. - you can do mico-batches (e.g. run a transform every few seconds/minutes), but it's not really intended for streaming - the only true streaming capability is to use the Java Message System (JMS) consumer and producer steps to connect to a JMS compliant data bus - it has "big data" connectors in open source, for read and writing from and too HDFS, MongoDB, Cassandra, Splunk - it is multi-threaded and performance is generally very good (e.g. one performance test processed 12K rows/second/core). It tends to scale-up linearly with added cores. It also scales-out linearly through J2EE appserver clustering. - Kettle is a Java runtime engine, and can run natively inside Hadoop as a MapReduce or Yarn job - it has pretty nice workflow capabilities that Pentaho touts as an alternative to using Oozie. But it also has an Oozie workflow job step. I'm new to Nifi so can't really contrast the two right now, but hopefully this information is useful.
... View more