Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

[HDP-2.3.2] Kafka brokers(topics) suddenly becomes unavailable.

Highlighted

[HDP-2.3.2] Kafka brokers(topics) suddenly becomes unavailable.

Expert Contributor

After HDP rolling upgrade, we're currently experiencing this weird event on our kafka cluster.

Everything is fine before the upgrade.

After the upgrade, we're hitting this issue when all kafka brokers/topics suddenly becomes unavailable.

When checking on zk, it shows that brokers were unregistered. Logs also showing "cached zkversion is not equal to that in zookeeper"

Our quick fix was to restart zk and kafka. Already happened twice on our prod env. :( #dataloss

Does anyone have this same issue?

Please help.

5 REPLIES 5
Highlighted

Re: [HDP-2.3.2] Kafka brokers(topics) suddenly becomes unavailable.

Expert Contributor

I can see logs on server.log

[2016-03-06 13:04:24,414] INFO Partition [NSN_IN_RECHARGE2,0] on broker 1001: Shrinking ISR for partition [NSN_IN_RECHARGE2,0] from 1001,1003 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,417] INFO Partition [NSN_IN_ZEROEXP,2] on broker 1001: Shrinking ISR for partition [NSN_IN_ZEROEXP,2] from 1001,1003 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,419] INFO Partition [NSN_IN_DATA2,0] on broker 1001: Shrinking ISR for partition [NSN_IN_DATA2,0] from 1002,1001 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,421] INFO Partition [NSN_IN_AIRTIMEREL,1] on broker 1001: Shrinking ISR for partition [NSN_IN_AIRTIMEREL,1] from 1001,1003 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,423] INFO Partition [RESPONSE_SMS,0] on broker 1001: Shrinking ISR for partition [RESPONSE_SMS,0] from 1001,1003 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,425] INFO Partition [MEF1.1,3] on broker 1001: Shrinking ISR for partition [MEF1.1,3] from 1001,1003 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,434] INFO Partition [NSN_IN_AIRTIMEREL,5] on broker 1001: Shrinking ISR for partition [NSN_IN_AIRTIMEREL,5] from 1001,1003 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,436] INFO Partition [TNT_IN_DATA,0] on broker 1001: Shrinking ISR for partition [TNT_IN_DATA,0] from 1001,1003 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,438] INFO Partition [URM_MWTRF,0] on broker 1001: Shrinking ISR for partition [URM_MWTRF,0] from 1001,1003 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,439] INFO Partition [AUDIT_SPARK_INGEST,1] on broker 1001: Shrinking ISR for partition [AUDIT_SPARK_INGEST,1] from 1001,1003 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,441] INFO Partition [TNT_IN_VOU,0] on broker 1001: Shrinking ISR for partition [TNT_IN_VOU,0] from 1001,1003 to 1001 (kafka.cluster.Partition)
[2016-03-06 13:04:24,443] INFO Partition [MEF1.1-SSQCx,2] on broker 1001: Shrinking ISR for partition [MEF1.1-SSQCx,2] from 1001,1003 to 1001 (kafka.cluster.Partition)

Re: [HDP-2.3.2] Kafka brokers(topics) suddenly becomes unavailable.

Explorer

Looks like you might be hitting KAFKA-2729 and KAFKA-3042. It seems like after a controller failover it is possible that the metadata cache does not include the leader details in the live brokers information and that causes the follower to error.

Highlighted

Re: [HDP-2.3.2] Kafka brokers(topics) suddenly becomes unavailable.

Expert Contributor

Thanks for the link @Alberto Romero, will check it later.

Highlighted

Re: [HDP-2.3.2] Kafka brokers(topics) suddenly becomes unavailable.

Hi @Michael Dennis Uanang, have you resolved this? We recently did a RU of a cluster with Kafka, and ended up manually changing Broker IDs in meta.properties on each broker's volume from 1001, 1002, ... to 0, 1, 2, ... Also, if you are using a custom port (different from default 6667) make sure it's set in the "listeners" property.

Highlighted

Re: [HDP-2.3.2] Kafka brokers(topics) suddenly becomes unavailable.

Expert Contributor

Manually changing the brokerid will also do.

For the ISR shrinking, we adjusted the replica settings on brokers so that all brokers are in-sync (ISR). We also adjusted our heapsize from 1G to 2G for brokers. :) Thanks!

Don't have an account?
Coming from Hortonworks? Activate your account here