Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Can kafka broker, yarn nodemanager, hdfs data node be installed on same host?

Rising Star

I meet a significant performance problem recently.

I have about 30 spark streaming applications, which read data from kafka and write the data to hdfs. But recently the writing progress on some spark executor become very slow. The data amount for each spark tasks are similar, but the time cost of tasks are in great difference, where the slowest one is about 4 times of the fastest one.

I have checked the disk usage, where disk use time on some hosts are about 80% to 90%.

So I guess if it is caused by slow hdfs writing speed, because of my kafka broker, hdfs data node, yarn nodemanager locating on same hosts.

So will it actually affect the performance?


Super Collaborator

"Can they" - Yes.

"Should they" - I would say no.

Kafka is very memory and disk sensitive. Depending on your use of it, it could even use more I/O than the combination of the DataNode and NodeManager on the same machine.

Personally, I would recommend installing Kafka brokers on dedicated hardware, even separate from the Zookeeper servers it needs, if at all possible.

The Spark executors do not need to be running on the Kafka brokers, they should work fine pulling remotely from the YARN NodeManagers.

Rising Star

Thanks @Jordan Moore

As 'Kafka is very memory and disk sensitive. ', do you recommend to install kafka brokers on a virtual machine, as I cannot have more dedicated machines for kafka?

Super Collaborator

@Junfeng Chen, as mentioned, it depends on your use of it. It will run okay in most deployment patterns, and it can run fine in VMs, but of course having dedicated hardware is always preffered.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.