Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Can kafka broker, yarn nodemanager, hdfs data node be installed on same host?

Can kafka broker, yarn nodemanager, hdfs data node be installed on same host?

Rising Star

I meet a significant performance problem recently.

I have about 30 spark streaming applications, which read data from kafka and write the data to hdfs. But recently the writing progress on some spark executor become very slow. The data amount for each spark tasks are similar, but the time cost of tasks are in great difference, where the slowest one is about 4 times of the fastest one.

I have checked the disk usage, where disk use time on some hosts are about 80% to 90%.

So I guess if it is caused by slow hdfs writing speed, because of my kafka broker, hdfs data node, yarn nodemanager locating on same hosts.

So will it actually affect the performance?

3 REPLIES 3

Re: Can kafka broker, yarn nodemanager, hdfs data node be installed on same host?

Super Collaborator

"Can they" - Yes.

"Should they" - I would say no.

Kafka is very memory and disk sensitive. Depending on your use of it, it could even use more I/O than the combination of the DataNode and NodeManager on the same machine.

Personally, I would recommend installing Kafka brokers on dedicated hardware, even separate from the Zookeeper servers it needs, if at all possible.

The Spark executors do not need to be running on the Kafka brokers, they should work fine pulling remotely from the YARN NodeManagers.

Re: Can kafka broker, yarn nodemanager, hdfs data node be installed on same host?

Rising Star

Thanks @Jordan Moore

As 'Kafka is very memory and disk sensitive. ', do you recommend to install kafka brokers on a virtual machine, as I cannot have more dedicated machines for kafka?

Re: Can kafka broker, yarn nodemanager, hdfs data node be installed on same host?

Super Collaborator

@Junfeng Chen, as mentioned, it depends on your use of it. It will run okay in most deployment patterns, and it can run fine in VMs, but of course having dedicated hardware is always preffered.

Don't have an account?
Coming from Hortonworks? Activate your account here