Support Questions

Find answers, ask questions, and share your expertise

Kafka apache on jbod disks + why kafka broker failed when one disk reached 100%

avatar

we have 3 Kafka brokers machines on on RHEL 7.9 Linux version ,

 

( when each machine is physical strong `DELL HW` - memory = 512G and 96 CORE CPU )

 

Kafka cluster is in production mode

 

 

Kafka version is 2.7.x , and Kafka disks are in Jbod configuration

each Kafka broker has 8 Jbod disks , as we can see from the following ( df -h details )


df -h

/dev/sdc 1.7T 929G 748G 56% /kafka/kafka_logs2
/dev/sdd 1.7T 950G 727G 57% /kafka/kafka_logs3
/dev/sde 1.7T 999G 678G 60% /kafka/kafka_logs4
/dev/sdf 1.7T 971G 706G 58% /kafka/kafka_logs5
/dev/sdg 1.7T 1.7T 20K 100% /kafka/kafka-logs6 <-----------------
/dev/sdh 1.7T 962G 714G 58% /kafka/kafka_logs7
/dev/sdi 1.7T 1.1T 621G 63% /kafka/kafka_logs8

 

 

as we can see from above disk - `/kafka/kafka_logs6` get `100%` used


after short investigation we found that Kafka broker isn't tolerant when one disk is failed or disk reached to 100% , as results of this Kafka broker now is down

here the Kafka `server.log`

 

 

[2022-11-29 15:43:59,723] ERROR Error while writing to checkpoint file /kafka/kafka-logs6 .............
java.io.IOException: No space left on device
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221)
at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:295)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:141)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:229)
at java.io.BufferedWriter.flush(BufferedWriter.java:254)
at kafka.server.checkpoints.CheckpointFile.liftedTree1$1(CheckpointFile.scala:108)
at kafka.server.checkpoints.CheckpointFile.write(CheckpointFile.scala:92)
at kafka.server.checkpoints.LeaderEpochCheckpointFile.write(LeaderEpochCheckpointFile.scala:70)
at kafka.server.epoch.LeaderEpochFileCache.flush(LeaderEpochFileCache.scala:292)
at kafka.server.epoch.LeaderEpochFileCache.$anonfun$truncateFromEnd$1(LeaderEpochFileCache.scala:238)
at kafka.server.epoch.LeaderEpochFileCache.truncateFromEnd(LeaderEpochFileCache.scala:235)
at kafka.log.Log.$anonfun$new$1(Log.scala:305)
at kafka.log.Log.<init>(Log.scala:305)
at kafka.log.Log$.apply(Log.scala:2549)
at kafka.log.LogManager.loadLog(LogManager.scala:273)
at kafka.log.LogManager.$anonfun$loadLogs$12(LogManager.scala:352)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

**from my perspective** when we have **8 disks** on each broker and one disks is failed ( like reached 100% ) , then we expect Kafka broker still be alive even one disk is failed

so as results of above scenario we searched in Kafka `server.properties` the parameter that can help us to configure the Kafka broker to be tolerant when one disk is failed
but we not found it , or maybe we not know what to set in order to define the kafka broker to be tolerant with one disk failure


the full parameters are:


more server.properties
auto.create.topics.enable=false
auto.leader.rebalance.enable=true
background.threads=10
log.retention.bytes=-1
log.retention.hours=48
delete.topic.enable=true
leader.imbalance.check.interval.seconds=300
leader.imbalance.per.broker.percentage=10
log.dir=/kafka/kafka-logs2,/kafka/kafka-logs3 ...............
log.flush.interval.messages=9223372036854775807
log.flush.interval.ms=1000
log.flush.offset.checkpoint.interval.ms=60000
log.flush.scheduler.interval.ms=9223372036854775807
log.flush.start.offset.checkpoint.interval.ms=60000
compression.type=producer
log.roll.jitter.hours=0
log.segment.bytes=1073741824
log.segment.delete.delay.ms=60000
message.max.bytes=1000012
min.insync.replicas=1
num.io.threads=10
num.network.threads=48
num.recovery.threads.per.data.dir=1
num.replica.fetchers=1
offset.metadata.max.bytes=4096
offsets.commit.required.acks=-1
offsets.commit.timeout.ms=5000
offsets.load.buffer.size=5242880
offsets.retention.check.interval.ms=600000
offsets.retention.minutes=10080
offsets.topic.compression.codec=0
offsets.topic.num.partitions=50
offsets.topic.replication.factor=3
offsets.topic.segment.bytes=104857600
queued.max.requests=1000
quota.consumer.default=9223372036854775807
quota.producer.default=9223372036854775807
replica.fetch.min.bytes=1
replica.fetch.wait.max.ms=500
replica.high.watermark.checkpoint.interval.ms=5000
replica.lag.time.max.ms=10000
replica.socket.receive.buffer.bytes=65536
replica.socket.timeout.ms=30000
request.timeout.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
transaction.max.timeout.ms=900000
transaction.state.log.load.buffer.size=5242880
transaction.state.log.min.isr=2
transaction.state.log.num.partitions=50
transaction.state.log.replication.factor=3
transaction.state.log.segment.bytes=104857600
transactional.id.expiration.ms=604800000
unclean.leader.election.enable=false
zookeeper.connection.timeout.ms=600000
zookeeper.max.in.flight.requests=10
zookeeper.session.timeout.ms=600000
zookeeper.set.acl=false
broker.id.generation.enable=true
connections.max.idle.ms=600000
connections.max.reauth.ms=0
controlled.shutdown.enable=true
controlled.shutdown.max.retries=3
controlled.shutdown.retry.backoff.ms=5000
controller.socket.timeout.ms=30000
default.replication.factor=3
delegation.token.expiry.time.ms=86400000
delegation.token.max.lifetime.ms=604800000
delete.records.purgatory.purge.interval.requests=1
fetch.purgatory.purge.interval.requests=1000
group.initial.rebalance.delay.ms=3000
group.max.session.timeout.ms=1800000
group.max.size=2147483647
group.min.session.timeout.ms=6000
log.cleaner.backoff.ms=15000
log.cleaner.dedupe.buffer.size=134217728
log.cleaner.delete.retention.ms=86400000
log.cleaner.enable=true
log.cleaner.io.buffer.load.factor=0.9
log.cleaner.io.buffer.size=524288
log.cleaner.io.max.bytes.per.second=1.7976931348623157e308
log.cleaner.max.compaction.lag.ms=9223372036854775807
log.cleaner.min.cleanable.ratio=0.5
log.cleaner.min.compaction.lag.ms=0
log.cleaner.threads=1
log.cleanup.policy=delete
log.index.interval.bytes=4096
log.index.size.max.bytes=10485760
log.message.timestamp.difference.max.ms=9223372036854775807
log.message.timestamp.type=CreateTime
log.preallocate=false
log.retention.check.interval.ms=300000
max.connections=2147483647
max.connections.per.ip=2147483647
max.incremental.fetch.session.cache.slots=1000
num.partitions=1
producer.purgatory.purge.interval.requests=1000
queued.max.request.bytes=-1
replica.fetch.backoff.ms=1000
replica.fetch.max.bytes=1048576
replica.fetch.response.max.bytes=10485760
reserved.broker.max.id=1500
transaction.abort.timed.out.transaction.cleanup.interval.ms=60000
transaction.remove.expired.transaction.cleanup.interval.ms=3600000
zookeeper.sync.time.ms=2000
broker.rack=/default-rack

 

I want to add my personal feeling:

*just to gives here the absurder of the above scenario
lets say we have on each Kafka broker 100 disks ( in Jbod )
is it make sense that Kafka broker will be shutdown just because one disk is failed ?*

Michael-Bronson
0 REPLIES 0