Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

difference between kafka and sqoop

avatar
Expert Contributor

Sqoop, Flume and Kafka all are use to import data from legacy system to the Hadoop. what is the difference between them ? and how to select which component to use in which situation ?

1 ACCEPTED SOLUTION

avatar
Guru

Hi @heta desai

You could be using HDF (NiFi) as your primary ingestion tool and not have to worry about the other options necessarily. That said, sqoop is primarily used to move data from an existing RDBMS to Hadoop (or vice versa). Flume was the main tool previously to ingest flat files, csv, etc, but has fallen out of favour and is often being replaced with HDF/NiFi now. Kafka is a distributed messaging system which can be used as a pub/sub model for data ingest, including streaming. So all three are a bit different. The right tool for the job depends on your use case, but as I said, HDF/NiFi can pretty much cover the gambit, so if you are starting out now, you may want to look at that first. Here is another good write up on the same subject:

https://community.hortonworks.com/questions/23337/best-tools-to-ingest-data-to-hadoop.html

As always, if you find this post useful, please accept the answer.

View solution in original post

1 REPLY 1

avatar
Guru

Hi @heta desai

You could be using HDF (NiFi) as your primary ingestion tool and not have to worry about the other options necessarily. That said, sqoop is primarily used to move data from an existing RDBMS to Hadoop (or vice versa). Flume was the main tool previously to ingest flat files, csv, etc, but has fallen out of favour and is often being replaced with HDF/NiFi now. Kafka is a distributed messaging system which can be used as a pub/sub model for data ingest, including streaming. So all three are a bit different. The right tool for the job depends on your use case, but as I said, HDF/NiFi can pretty much cover the gambit, so if you are starting out now, you may want to look at that first. Here is another good write up on the same subject:

https://community.hortonworks.com/questions/23337/best-tools-to-ingest-data-to-hadoop.html

As always, if you find this post useful, please accept the answer.