Support Questions

anggriawanrezab · ‎12-26-2016

if i have three server as log source and one server for place where log collected. where should i install kafka?

- i have to install it on each server or

- i need to add one more server to install kafka between source server and server where data collected

Thanks.

ambud_sharma1 · ‎12-29-2016

Setup:

To get maximum fault tolerance / performance, install Kafka on each node (source server) so you will end up with 3 nodes. Additionally you will also need zookeeper which you can once again install on all 3 nodes.

The above setup is recommended if this is a PoC, however for production use it's recommended to have Kafka + Zookeeper on nodes other than your source nodes to provide fault tolerance. Additionally, Kafka uses a lot of OS page caching which may interfere with the application running the 3 nodes.

Just to clarify Kafka shouldn't be confused with Flume, it's a MessageBroker service; you are responsible for ingesting data (use Flume) or reading data (e.g. Storm / Spark)

Scaling:

Scaling Kafka is a 3 step operation; step 3 is optional but recommended:

Add nodes to the cluster (Use Ambari to add nodes)
Alter Topic and add additional partitions (1 partition / node)
(Optional) Rebalance Kafka

On a side note: Kafka works best when used in clustered mode, you can use single-node Kafka however it it fundamentally defeats the purpose of Kafka (partitioning and fault-tolerance)

View solution in original post

rajkumar_singh · ‎12-26-2016

@Bramantya Anggriawan

ideally you should install the kafka broker on the nodes where logs should be collected, topic partition is the unit of parallelism in Kafka. On both the producer and the broker side, writes to different partitions can be done fully in parallel.if you dont have many topics on which you need to produce data to then you can have 1-2 kafka server

anggriawanrezab · ‎12-27-2016

so i only have to install Kafka in one server where the logs collected, how if i want to add more server for kafka cluster?