- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
difference between kafka and sqoop
Created ‎05-29-2017 05:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sqoop, Flume and Kafka all are use to import data from legacy system to the Hadoop. what is the difference between them ? and how to select which component to use in which situation ?
Created ‎05-29-2017 03:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @heta desai
You could be using HDF (NiFi) as your primary ingestion tool and not have to worry about the other options necessarily. That said, sqoop is primarily used to move data from an existing RDBMS to Hadoop (or vice versa). Flume was the main tool previously to ingest flat files, csv, etc, but has fallen out of favour and is often being replaced with HDF/NiFi now. Kafka is a distributed messaging system which can be used as a pub/sub model for data ingest, including streaming. So all three are a bit different. The right tool for the job depends on your use case, but as I said, HDF/NiFi can pretty much cover the gambit, so if you are starting out now, you may want to look at that first. Here is another good write up on the same subject:
https://community.hortonworks.com/questions/23337/best-tools-to-ingest-data-to-hadoop.html
As always, if you find this post useful, please accept the answer.
Created ‎05-29-2017 03:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @heta desai
You could be using HDF (NiFi) as your primary ingestion tool and not have to worry about the other options necessarily. That said, sqoop is primarily used to move data from an existing RDBMS to Hadoop (or vice versa). Flume was the main tool previously to ingest flat files, csv, etc, but has fallen out of favour and is often being replaced with HDF/NiFi now. Kafka is a distributed messaging system which can be used as a pub/sub model for data ingest, including streaming. So all three are a bit different. The right tool for the job depends on your use case, but as I said, HDF/NiFi can pretty much cover the gambit, so if you are starting out now, you may want to look at that first. Here is another good write up on the same subject:
https://community.hortonworks.com/questions/23337/best-tools-to-ingest-data-to-hadoop.html
As always, if you find this post useful, please accept the answer.
