Created 06-01-2018 11:27 AM
Dear,
After upgraded from HDP 2.6.4 to 2.6.5 in my LAB to check all components before upgrade DEV and PROD cluster, I realized that Atlas was not working properly missing the informations for lineage. Going deeply I found an issue on Kafka that kafka console consumer is not getting the messages through these simple commands below:
hduser@> kafka-console-consumer.sh --bootstrap-server 10.0.20.101:6667,10.0.20.102:6667,10.0.20.103:6667 --topic kennon_2 --security-protocol SASL_PLAINTEXT --from-beginning hduser@> kafka-console-consumer.sh --bootstrap-server 10.0.20.101:6667,10.0.20.102:6667,10.0.20.103:6667 --topic ATLAS_HOOK --security-protocol SASL_PLAINTEXT --from-beginning hduser@> kafka-console-consumer.sh --bootstrap-server 10.0.20.101:6667,10.0.20.102:6667,10.0.20.103:6667 --topic ATLAS_ENTITIES --security-protocol SASL_PLAINTEXT --from-beginning
There is no message consuming, but I checked the producer and all the messages are going to the topic. If I put the option '--partition 0' for example I just see the messages from this partition and If I change --bootstrap-server to --zookeeper I can see almost all messages, yes I said almost not all, like 80% of the messages. In this case there is no log error because there is a connection established and the application just waiting to consume the message.
hduser@> kafka-console-consumer.sh ---bootstrap-server 10.0.20.101:6667,10.0.20.102:6667,10.0.20.103:6667 --topic kennon_2 --security-protocol SASL_PLAINTEXT --from-beginning --partition 0 or hduser@> kafka-console-consumer.sh --zookeeper 10.0.20.101:2181,10.0.20.102:2181,10.0.20.103:2181/kafka --topic kennon_2 --security-protocol SASL_PLAINTEXT --from-beginning
I have another LAB that I deployed a little cluster from the scratch to test HDP 2.6.5 and I'm facing the same issue with Kafka.
The concern is If we upgrade the version will take a little time to realized that all applications consuming kafka topics are not working properly, including Apache Atlas that is not doing the lineage or consuming the ATLAS TOPICS.
In the last 2 previous (HDP 2.6.3 and 2.6.4) versions are working fine, without any problem.
Taking a look to the release notes (https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_release-notes/content/new_features.html) KAFKA is a new version and there are some patch fixes.
KafkaThis release provides Kafka 1.0.0 and the following Apache patches.
HDP 2.6.4 provided Kafka 0.10.1 with no additional Apache patches. HDP 2.6.3 provided Kafka 0.10.1 and the following Apache patches:
HDP 2.6.2 provided Kafka 0.10.1 with no additional Apache patches. HDP 2.6.1 provided Kafka 0.10.1 with no additional Apache patches. HDP 2.6.0 provided Kafka 0.10.1 with no additional Apache patches. |
Please guys If do you have some thoughts I'll appreciate.
Kennon
Created 06-05-2018 04:08 PM
1. Did you check the "describe" command for each topic? it matches with the brokers ids registered in zookeeper and meta.properties file?
2. Add --new-consumer to the command and try again
Created 06-06-2018 09:55 AM
Hi @mrodriguez,
Firstly thank you for your prompt reply.
Yes, I checked before and I just run the command to show you below. In this case I have 3 brokers. I also tested with only 1 broker and I have the same problem.
This is only a behavior in this version. I also forced to create 1 partition and replication 1 to get only from one broker.
Atlas in this version of HDP also doesn't consuming the messages. I'm debugging to find the root cause as well.
Just to be sure I deployed a 3 nodes cluster from the scratch for this latest version and I'm facing the same problem, mainly because Atlas is configured as default to consume messages from HDP's Kafka.
[root@~]# /usr/hdp/2.6.5.0-292/kafka/bin/kafka-topics.sh --zookeeper 10.0.20.101:2181,10.0.20.102:2181,10.0.20.103:2181/kafka --describe --topic kennon_2 Topic:kennon_2 PartitionCount:1 ReplicationFactor:1 Configs: Topic: kennon_2 Partition: 0 Leader: 1001 Replicas: 1001 Isr: 1001 [root@~]# /usr/hdp/2.6.5.0-292/kafka/bin/kafka-topics.sh --zookeeper 10.0.20.101:2181,10.0.20.102:2181,10.0.20.103:2181/kafka --describe --topic ATLAS_HOOK Topic:ATLAS_HOOK PartitionCount:1 ReplicationFactor:1 Configs: Topic: ATLAS_HOOK Partition: 0 Leader: 1001 Replicas: 1001 Isr: 1001 [root@~]# /usr/hdp/2.6.5.0-292/kafka/bin/kafka-topics.sh --zookeeper 10.0.20.101:2181,10.0.20.102:2181,10.0.20.103:2181/kafka --describe --topic ATLAS_ENTITIES Topic:ATLAS_ENTITIES PartitionCount:1 ReplicationFactor:1 Configs: Topic: ATLAS_ENTITIES Partition: 0 Leader: 1001 Replicas: 1001 Isr: 1001 [zk: localhost:2181(CONNECTED) 0] ls /kafka/brokers/ids/100 1003 1002 1001 cat /var/data/kafka-logs/meta.properties # #Thu May 31 18:47:49 IST 2018 version=0 broker.id=1003 cat /var/data/kafka-logs/meta.properties # #Thu May 31 18:46:34 IST 2018 version=0 broker.id=1001 cat /var/data/kafka-logs/meta.properties # #Thu May 31 18:46:37 IST 2018 version=0 broker.id=1002
Regards,
Kennon
Created 06-07-2018 10:45 PM
Hi,
1. I would recommend to run with FQDNs instead of the IP in --bootstrap-server
2. Are you facing the same issue with any test topic that you create? Could you please also describe ' __consumer_offsets' topic?
3. You can also turn on client side debugging by changing the log level to DEBUG in tools-log4j.properties file :
log4j.rootLogger=DEBUG, stderr
Thanks!
Created 06-11-2018 09:42 AM
HI @dbains,
Thanks for your reply.
I'm using the FQDN instead of IPv4, I just published here using IP address. I'm facing with all new topics created as well. As I said before, I also deployed form the scratch a 3 node cluster in my lab and Atlas is not working in a standard way like the previous versions of HDP. I also checked _consumer_offsets as well and seems normal.
If I don't specify the partition, even with only 1 partition, the consumer not works. The log doesn't show a problem, I also changed to debug mode.
I have a lot of followers in this topic. I think more people are facing this same issue.
Kind Regards,
Kennon
Created 06-11-2018 09:43 AM
HI @dbains,
Thanks for your reply.
I'm using the FQDN instead of IPv4, I just published here using IP address. I'm facing with all new topics created as well. As I said before, I also deployed form the scratch a 3 node cluster in my lab and Atlas is not working in a standard way like the previous versions of HDP. I also checked _consumer_offsets as well and seems normal.
If I don't specify the partition, even with only 1 partition, the consumer not works. The log doesn't show a problem, I also changed to debug mode.
I have a lot of followers in this topic. I think more people are facing this same issue.
Kind Regards,
Kennon