I try to build recommendation solution with Web App-Kafka-Flume?-Spark-Flume?-Kafka-Web App.
1) What are benefits or disandvantages if there are Flume between Kafka and Spark in data ingestion?
2) What are benefits or disandvantages if there are Flume between Spark and Kafka in after analytics?
3) What are benefits or adventages if Kafka is inside Cloudera Enterprise Data Hub instead of outside(dedicated Kafka server)?
1) Spark can consume messages in Kafka. So basicaly, why adding an extra layer between them ?
2) Same question
3) Kafka can be "inside your Entreprise Data Hub" and still on dedicated nodes :) but I guess it's just a matter of what you consider your "Entreprise data hub". As for the other part : why dedicated nodes ? performance.
Kafka (and its zk cluster) are I/O heavy and can be slowed down if others services use the disks.
Some "guidance" on Hortonworks-side : https://community.hortonworks.com/articles/80813/kafka-best-practices-1.html
And from what I hear from Cloudera they haveroughly the same "guildelines" (which makes sence).