I have 3 servers to use as Kafka brokers - each with 12 drives each. We will have 4 non-partitioned topics initially. As I understand it, each topic is technically a new partition, so in essence we will have 4 partitions.
I am torn between JBOD and multiple RAID 1 arrays for this scenario. If I do JBOD, then each topic will only utilize ONE disk on each broker since it's basically mounted to a single directory - correct? But if I went with five RAID 1 arrays - one for each topic, I would gain more spindles per topic plus the benefit of fault tolerance.
As an alternative I could simply do one large RAID 10 array with the 12 drives. Any thoughts?
1) What is the motiviation for not partitioning the topics?
"The partitions in the log serve several purposes. First, they allow the log to scale beyond a size that will fit on a single server. Each individual partition must fit on the servers that host it, but a topic may have many partitions so it can handle an arbitrary amount of data. Second they act as the unit of parallelism—more on that in a bit."
2) RAID10 gives you some flexilibty in the case of drive failure. You will still have to take an outage to rebuild, but you can plan for it. If you lose a disk in a JBOD configuration your broker will shutdown. Another downside of JBOD is there really isn't any intelligent partition assignment logic, so it might be a bit trickier to manage.
I'd also suggest you read the Kafka Improvement Proposal (KIP) on JBOD enhancements:
Thank you for your feedback. Sorry for the delay. I replied via email but it didn't post.
The motivation behind not partitioning this specific topic is needing to keep the message order of the HL7 data. And this will be the "busiest" topic in terms of I/O. The other topics could possibly be partitioned, but a lot of it is TBD as we ramp up use cases.
I agree with your assessment of RAID 10, and I will read the other documentation (already read one of them).
Jeff (or anyone else too),
When creating a Kafka topic, can I tell it where (what mount point) to create the initial partition? Or does it pick this randomly?
The reason I ask is..... since the HL7 topic has to maintain message order and can’t be partitioned, I was thinking about creating a RAID 1 for that topic only, and use JBOD for the rest. But when creating the topic, if I can’t tell Kafka what mount point to put the initial partition, then it’s no use.