Created 12-26-2016 07:41 AM
if i have three server as log source and one server for place where log collected. where should i install kafka?
- i have to install it on each server or
- i need to add one more server to install kafka between source server and server where data collected
Thanks.
Created 12-29-2016 06:45 PM
Setup:
To get maximum fault tolerance / performance, install Kafka on each node (source server) so you will end up with 3 nodes. Additionally you will also need zookeeper which you can once again install on all 3 nodes.
The above setup is recommended if this is a PoC, however for production use it's recommended to have Kafka + Zookeeper on nodes other than your source nodes to provide fault tolerance. Additionally, Kafka uses a lot of OS page caching which may interfere with the application running the 3 nodes.
Just to clarify Kafka shouldn't be confused with Flume, it's a MessageBroker service; you are responsible for ingesting data (use Flume) or reading data (e.g. Storm / Spark)
Scaling:
Scaling Kafka is a 3 step operation; step 3 is optional but recommended:
On a side note: Kafka works best when used in clustered mode, you can use single-node Kafka however it it fundamentally defeats the purpose of Kafka (partitioning and fault-tolerance)
Created 12-26-2016 07:45 AM
ideally you should install the kafka broker on the nodes where logs should be collected, topic partition is the unit of parallelism in Kafka. On both the producer and the broker side, writes to different partitions can be done fully in parallel.if you dont have many topics on which you need to produce data to then you can have 1-2 kafka server
Created 12-27-2016 08:50 AM
so i only have to install Kafka in one server where the logs collected, how if i want to add more server for kafka cluster?
Thanks.
Created 12-29-2016 06:45 PM
Setup:
To get maximum fault tolerance / performance, install Kafka on each node (source server) so you will end up with 3 nodes. Additionally you will also need zookeeper which you can once again install on all 3 nodes.
The above setup is recommended if this is a PoC, however for production use it's recommended to have Kafka + Zookeeper on nodes other than your source nodes to provide fault tolerance. Additionally, Kafka uses a lot of OS page caching which may interfere with the application running the 3 nodes.
Just to clarify Kafka shouldn't be confused with Flume, it's a MessageBroker service; you are responsible for ingesting data (use Flume) or reading data (e.g. Storm / Spark)
Scaling:
Scaling Kafka is a 3 step operation; step 3 is optional but recommended:
On a side note: Kafka works best when used in clustered mode, you can use single-node Kafka however it it fundamentally defeats the purpose of Kafka (partitioning and fault-tolerance)