I am working on kafka and flume.
Once we import data from RDBMS to HADOOP or other systems and after that if data will be updated in RDBMS then data in Hadoop will updated automatically or we have to update it manually??
Can you describe when to use kafka,flume and sqoop?
Difference between three of them??
If i want to import streaming data to HADOOP or any other system then which one i can use kafka or flume?
If i have a flat file then which one is better to use kafka or flume?
can you please explain , what is the difference between kafka,flume and sqoop?
When to use kafka, flume and sqoop?
Sqoop - Sqoop is used to move data from an existing RDBMS to Hadoop (or vice versa).
Once the data is imported from Sqoop initially (i.e initial load is performed), the incremental data (i.e the data which is updated in RDBMS) is not updated automatically. It needs to be imported incrementally using incremental imports .
Flume - Flume was the main tool previously to ingest log files, events, flat files, csv, etc. Flume has recently fallen out of favour and is often being replaced with HDF(Hortonworks DataFlow) /NiFi.
HDF(Hortonworks DataFlow)/NiFi - NiFi provides a visual user interface with more than 180+ processors for collecting data from various sources (Sensors,geo-location devices, machine, logs, files, feeds etc) & perform simple event processing (e.g. parsing, filtering etc) & delivering to storage platforms such as HDP in a secure environment.
Kafka is a distributed fault tolerant messaging system which lets you to publish and subscribe to streams of records. Generally it is used while doing real time stream processing where real time messages are buffered in Kafka and are consumed by storm or spark streaming.
The right tool for the job depends on your use case.
Here is another good write up on the same subject:
As always, if you find this post useful, please accept the answer.