Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Where is Kafka have to be installed ?

Solved Go to solution
Highlighted

Where is Kafka have to be installed ?

Contributor

if i have three server as log source and one server for place where log collected. where should i install kafka?

- i have to install it on each server or

- i need to add one more server to install kafka between source server and server where data collected

Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Where is Kafka have to be installed ?

Rising Star

Setup:

To get maximum fault tolerance / performance, install Kafka on each node (source server) so you will end up with 3 nodes. Additionally you will also need zookeeper which you can once again install on all 3 nodes.

The above setup is recommended if this is a PoC, however for production use it's recommended to have Kafka + Zookeeper on nodes other than your source nodes to provide fault tolerance. Additionally, Kafka uses a lot of OS page caching which may interfere with the application running the 3 nodes.

Just to clarify Kafka shouldn't be confused with Flume, it's a MessageBroker service; you are responsible for ingesting data (use Flume) or reading data (e.g. Storm / Spark)

Scaling:

Scaling Kafka is a 3 step operation; step 3 is optional but recommended:

  1. Add nodes to the cluster (Use Ambari to add nodes)
  2. Alter Topic and add additional partitions (1 partition / node)
  3. (Optional) Rebalance Kafka

On a side note: Kafka works best when used in clustered mode, you can use single-node Kafka however it it fundamentally defeats the purpose of Kafka (partitioning and fault-tolerance)

View solution in original post

3 REPLIES 3
Highlighted

Re: Where is Kafka have to be installed ?

@Bramantya Anggriawan

ideally you should install the kafka broker on the nodes where logs should be collected, topic partition is the unit of parallelism in Kafka. On both the producer and the broker side, writes to different partitions can be done fully in parallel.if you dont have many topics on which you need to produce data to then you can have 1-2 kafka server

Highlighted

Re: Where is Kafka have to be installed ?

Contributor

so i only have to install Kafka in one server where the logs collected, how if i want to add more server for kafka cluster?

Thanks.

Re: Where is Kafka have to be installed ?

Rising Star

Setup:

To get maximum fault tolerance / performance, install Kafka on each node (source server) so you will end up with 3 nodes. Additionally you will also need zookeeper which you can once again install on all 3 nodes.

The above setup is recommended if this is a PoC, however for production use it's recommended to have Kafka + Zookeeper on nodes other than your source nodes to provide fault tolerance. Additionally, Kafka uses a lot of OS page caching which may interfere with the application running the 3 nodes.

Just to clarify Kafka shouldn't be confused with Flume, it's a MessageBroker service; you are responsible for ingesting data (use Flume) or reading data (e.g. Storm / Spark)

Scaling:

Scaling Kafka is a 3 step operation; step 3 is optional but recommended:

  1. Add nodes to the cluster (Use Ambari to add nodes)
  2. Alter Topic and add additional partitions (1 partition / node)
  3. (Optional) Rebalance Kafka

On a side note: Kafka works best when used in clustered mode, you can use single-node Kafka however it it fundamentally defeats the purpose of Kafka (partitioning and fault-tolerance)

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here