Support Questions

Find answers, ask questions, and share your expertise

Weird Behavior in KAFKA consumer on HDP 2.6.5 - Not consuming properly impacting Atlas as well

Explorer

Dear,

After upgraded from HDP 2.6.4 to 2.6.5 in my LAB to check all components before upgrade DEV and PROD cluster, I realized that Atlas was not working properly missing the informations for lineage. Going deeply I found an issue on Kafka that kafka console consumer is not getting the messages through these simple commands below:

hduser@> kafka-console-consumer.sh --bootstrap-server 10.0.20.101:6667,10.0.20.102:6667,10.0.20.103:6667  --topic kennon_2 --security-protocol SASL_PLAINTEXT --from-beginning

hduser@> kafka-console-consumer.sh --bootstrap-server 10.0.20.101:6667,10.0.20.102:6667,10.0.20.103:6667  --topic ATLAS_HOOK --security-protocol SASL_PLAINTEXT --from-beginning

hduser@> kafka-console-consumer.sh --bootstrap-server 10.0.20.101:6667,10.0.20.102:6667,10.0.20.103:6667  --topic ATLAS_ENTITIES --security-protocol SASL_PLAINTEXT --from-beginning

There is no message consuming, but I checked the producer and all the messages are going to the topic. If I put the option '--partition 0' for example I just see the messages from this partition and If I change --bootstrap-server to --zookeeper I can see almost all messages, yes I said almost not all, like 80% of the messages. In this case there is no log error because there is a connection established and the application just waiting to consume the message.

hduser@> kafka-console-consumer.sh ---bootstrap-server 10.0.20.101:6667,10.0.20.102:6667,10.0.20.103:6667  --topic kennon_2 --security-protocol SASL_PLAINTEXT --from-beginning --partition 0
or
hduser@> kafka-console-consumer.sh --zookeeper 10.0.20.101:2181,10.0.20.102:2181,10.0.20.103:2181/kafka  --topic kennon_2 --security-protocol SASL_PLAINTEXT --from-beginning 

I have another LAB that I deployed a little cluster from the scratch to test HDP 2.6.5 and I'm facing the same issue with Kafka.

The concern is If we upgrade the version will take a little time to realized that all applications consuming kafka topics are not working properly, including Apache Atlas that is not doing the lineage or consuming the ATLAS TOPICS.

In the last 2 previous (HDP 2.6.3 and 2.6.4) versions are working fine, without any problem.

Taking a look to the release notes (https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_release-notes/content/new_features.html) KAFKA is a new version and there are some patch fixes.

Kafka

This release provides Kafka 1.0.0 and the following Apache patches.

  • KAFKA-4827: Kafka connect: error with special characters in connector name.
  • KAFKA-6118: Transient failure in kafka.api.SaslScramSslEndToEndAuthorizationTest.testTwoConsumersWithDifferentSaslCredentials.
  • KAFKA-6156: JmxReporter can't handle windows style directory paths.
  • KAFKA-6164: ClientQuotaManager threads prevent shutdown when encountering an error loading logs.
  • KAFKA-6167: Timestamp on streams directory contains a colon, which is an illegal character.
  • KAFKA-6179: RecordQueue.clear() does not clear MinTimestampTracker's maintained list.
  • KAFKA-6185: Selector memory leak with high likelihood of OOM in case of down conversion.
  • KAFKA-6190: GlobalKTable never finishes restoring when consuming transactional messages.
  • KAFKA-6210: IllegalArgumentException if 1.0.0 is used for inter.broker.protocol.version or log.message.format.version.
  • KAFKA-6214: Using standby replicas with an in memory state store causes Streams to crash.
  • KAFKA-6215: KafkaStreamsTest fails in trunk.
  • KAFKA-6238: Issues with protocol version when applying a rolling upgrade to 1.0.0.
  • KAFKA-6260: AbstractCoordinator not clearly handles NULL Exception.
  • KAFKA-6261: Request logging throws exception if acks=0.
  • KAFKA-6274: Improve KTable Source state store auto-generated names.

HDP 2.6.4 provided Kafka 0.10.1 with no additional Apache patches.

HDP 2.6.3 provided Kafka 0.10.1 and the following Apache patches:

  • KAFKA-4360: Controller may deadLock when autoLeaderRebalance encounter zk expired.

HDP 2.6.2 provided Kafka 0.10.1 with no additional Apache patches.

HDP 2.6.1 provided Kafka 0.10.1 with no additional Apache patches.

HDP 2.6.0 provided Kafka 0.10.1 with no additional Apache patches.

Please guys If do you have some thoughts I'll appreciate.

Kennon

5 REPLIES 5

Expert Contributor

1. Did you check the "describe" command for each topic? it matches with the brokers ids registered in zookeeper and meta.properties file?

2. Add --new-consumer to the command and try again

Explorer

Hi @mrodriguez,

Firstly thank you for your prompt reply.

Yes, I checked before and I just run the command to show you below. In this case I have 3 brokers. I also tested with only 1 broker and I have the same problem.

This is only a behavior in this version. I also forced to create 1 partition and replication 1 to get only from one broker.

Atlas in this version of HDP also doesn't consuming the messages. I'm debugging to find the root cause as well.

Just to be sure I deployed a 3 nodes cluster from the scratch for this latest version and I'm facing the same problem, mainly because Atlas is configured as default to consume messages from HDP's Kafka.

[root@~]# /usr/hdp/2.6.5.0-292/kafka/bin/kafka-topics.sh --zookeeper 10.0.20.101:2181,10.0.20.102:2181,10.0.20.103:2181/kafka --describe --topic kennon_2
Topic:kennon_2    PartitionCount:1    ReplicationFactor:1    Configs:
    Topic: kennon_2    Partition: 0    Leader: 1001    Replicas: 1001    Isr: 1001
[root@~]# /usr/hdp/2.6.5.0-292/kafka/bin/kafka-topics.sh --zookeeper 10.0.20.101:2181,10.0.20.102:2181,10.0.20.103:2181/kafka --describe --topic ATLAS_HOOK
Topic:ATLAS_HOOK    PartitionCount:1    ReplicationFactor:1    Configs:
    Topic: ATLAS_HOOK    Partition: 0    Leader: 1001    Replicas: 1001    Isr: 1001
[root@~]# /usr/hdp/2.6.5.0-292/kafka/bin/kafka-topics.sh --zookeeper 10.0.20.101:2181,10.0.20.102:2181,10.0.20.103:2181/kafka --describe --topic ATLAS_ENTITIES
Topic:ATLAS_ENTITIES    PartitionCount:1    ReplicationFactor:1    Configs:
    Topic: ATLAS_ENTITIES    Partition: 0    Leader: 1001    Replicas: 1001    Isr: 1001

[zk: localhost:2181(CONNECTED) 0] ls /kafka/brokers/ids/100

1003   1002   1001

cat /var/data/kafka-logs/meta.properties
#
#Thu May 31 18:47:49 IST 2018
version=0
broker.id=1003

cat /var/data/kafka-logs/meta.properties
#
#Thu May 31 18:46:34 IST 2018
version=0
broker.id=1001

cat /var/data/kafka-logs/meta.properties
#
#Thu May 31 18:46:37 IST 2018
version=0
broker.id=1002


Regards,

Kennon

Rising Star

@Kennon Rodrigues

Hi,

1. I would recommend to run with FQDNs instead of the IP in --bootstrap-server

2. Are you facing the same issue with any test topic that you create? Could you please also describe ' __consumer_offsets' topic?

3. You can also turn on client side debugging by changing the log level to DEBUG in tools-log4j.properties file :

log4j.rootLogger=DEBUG, stderr

Thanks!

Explorer

HI @dbains,

Thanks for your reply.

I'm using the FQDN instead of IPv4, I just published here using IP address. I'm facing with all new topics created as well. As I said before, I also deployed form the scratch a 3 node cluster in my lab and Atlas is not working in a standard way like the previous versions of HDP. I also checked _consumer_offsets as well and seems normal.

If I don't specify the partition, even with only 1 partition, the consumer not works. The log doesn't show a problem, I also changed to debug mode.

I have a lot of followers in this topic. I think more people are facing this same issue.


Kind Regards,

Kennon

Explorer

HI @dbains,

Thanks for your reply.

I'm using the FQDN instead of IPv4, I just published here using IP address. I'm facing with all new topics created as well. As I said before, I also deployed form the scratch a 3 node cluster in my lab and Atlas is not working in a standard way like the previous versions of HDP. I also checked _consumer_offsets as well and seems normal.

If I don't specify the partition, even with only 1 partition, the consumer not works. The log doesn't show a problem, I also changed to debug mode.

I have a lot of followers in this topic. I think more people are facing this same issue.


Kind Regards,

Kennon