Support Questions

Find answers, ask questions, and share your expertise

Data will be updated automatically in kafka and flume???

Explorer

I am working on kafka and flume.

Once we import data from RDBMS to HADOOP or other systems and after that if data will be updated in RDBMS then data in Hadoop will updated automatically or we have to update it manually??

Can you describe when to use kafka,flume and sqoop?

Difference between three of them??

5 REPLIES 5

Super Guru
@priyal patel

You can simply use Sqoop incremental imports for this use case. You don't need Flume or Kafka for this.

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports

Explorer

If i want to import streaming data to HADOOP or any other system then which one i can use kafka or flume?

If i have a flat file then which one is better to use kafka or flume?

can you please explain , what is the difference between kafka,flume and sqoop?

When to use kafka, flume and sqoop?

Explorer

@priyal patel

Sqoop - Sqoop is used to move data from an existing RDBMS to Hadoop (or vice versa).

Once the data is imported from Sqoop initially (i.e initial load is performed), the incremental data (i.e the data which is updated in RDBMS) is not updated automatically. It needs to be imported incrementally using incremental imports .

https://sqoop.apache.org/docs/1.4.2/SqoopUserGuide.html#_incremental_imports

Flume - Flume was the main tool previously to ingest log files, events, flat files, csv, etc. Flume has recently fallen out of favour and is often being replaced with HDF(Hortonworks DataFlow) /NiFi.

HDF(Hortonworks DataFlow)/NiFi - NiFi provides a visual user interface with more than 180+ processors for collecting data from various sources (Sensors,geo-location devices, machine, logs, files, feeds etc) & perform simple event processing (e.g. parsing, filtering etc) & delivering to storage platforms such as HDP in a secure environment.

Kafka is a distributed fault tolerant messaging system which lets you to publish and subscribe to streams of records. Generally it is used while doing real time stream processing where real time messages are buffered in Kafka and are consumed by storm or spark streaming.

The right tool for the job depends on your use case.

Here is another good write up on the same subject:

https://community.hortonworks.com/questions/23337/best-tools-to-ingest-data-to-hadoop.html

As always, if you find this post useful, please accept the answer.

Explorer

@priyal patel Does the explanation provided address your question? If so, please "accept" the answer to close the posting.

Super Guru
Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.