Reply
Explorer
Posts: 11
Registered: ‎10-06-2017

Architecture advise needed how to use Kafka with Spark

I try to build recommendation solution with Web App-Kafka-Flume?-Spark-Flume?-Kafka-Web App.

 

1) What are benefits or disandvantages if there are Flume between Kafka and Spark in data ingestion? 

 

2) What are benefits or disandvantages if there are Flume between Spark and Kafka in after analytics?

 

3) What are benefits or adventages if Kafka is inside Cloudera Enterprise Data Hub instead of outside(dedicated Kafka server)?

Highlighted
Posts: 173
Topics: 8
Kudos: 19
Solutions: 19
Registered: ‎07-16-2015

Re: Architecture advise needed how to use Kafka with Spark

Hi,

 

1) Spark can consume messages in Kafka. So basicaly, why adding an extra layer between them ?

2) Same question

3) Kafka can be "inside your Entreprise Data Hub" and still on dedicated nodes :) but I guess it's just a matter of what you consider your "Entreprise data hub". As for the other part : why dedicated nodes ? performance.

Kafka (and its zk cluster) are I/O heavy and can be slowed down if others services use the disks.

 

Some "guidance" on Hortonworks-side : https://community.hortonworks.com/articles/80813/kafka-best-practices-1.html

And from what I hear from Cloudera they haveroughly the same "guildelines" (which makes sence).

 

regards,

Mathieu

Announcements