- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Configure JMX Exporter (java agent) for Kafka - CDP Private Base 7.1.9
Created 01-29-2025 04:42 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello,
I have got a JXM exporter configured for Kafka brokers to send metrics to prometheus which allows me to monitor Kafka via Grafana Dashboards. All is working okay, however, stuck with a weird issue.
Backaground:
JMX exporter is configured using below property for Kafka brokers:
in Kafka Broker Environment Advanced Configuration Snippet (Safety Valve) have added a key KAFKA_JMX_OPTS and its value as -javaagent:/var/lib/prometheus_jmx_config/jmx_prometheus_javaagent-0.20.0.jar=9091:/var/lib/prometheus_jmx_config/kafka_jmx_exporter.yml.
The jar and yml files provided in the configuration are in place and all works okay. We are able to get required metrics to Grafana via Prometheus.
Issue:
Whenever we attempt to restart Kafka brokers, it does happily, however, when attempted a rolling restart it fails with below error in stderr log file:
+ [[ healthy partitions stay healthy == *\p\a\r\t\i\t\i\o\n\s\ \s\t\a\y\ \h\e\a\l\t\h\y* ]]
+ call_kafka_topics --at-min-isr-partitions
+ option=--at-min-isr-partitions
+ get_property bootstrap.servers /var/run/cloudera-scm-agent/process/18315-kafka-KAFKA_BROKER-kafka_broker_rolling_restart_pre_check/rolling_restart_check_before_stop_admin_client_configs.properties BOOTSTRAP_SERVERS
++ dirname /opt/cloudera/cm-agent/service/common/cloudera-config.sh
+ GET_PROPERTY_PY_DIR=/opt/cloudera/cm-agent/service/common
++ python -u /opt/cloudera/cm-agent/service/common/get_property.py bootstrap.servers /var/run/cloudera-scm-agent/process/18315-kafka-KAFKA_BROKER-kafka_broker_rolling_restart_pre_check/rolling_restart_check_before_stop_admin_client_configs.properties
+ value=
+ eval 'BOOTSTRAP_SERVERS='\'''\'''
++ BOOTSTRAP_SERVERS=
+ [[ -z '' ]]
+ BOOTSTRAP_SERVERS=<broker>:6667
+ kafka-topics --at-min-isr-partitions --describe --bootstrap-server <broker>:6667 --command-config /var/run/cloudera-scm-agent/process/18315-kafka-KAFKA_BROKER-kafka_broker_rolling_restart_pre_check/rolling_restart_check_before_stop_admin_client_configs.properties
Exception in thread "main" java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndStartAgent(InstrumentationImpl.java:513)
at java.instrument/sun.instrument.InstrumentationImpl.loadClassAndCallPremain(InstrumentationImpl.java:525)
Caused by: java.net.BindException: Address already in use
at java.base/sun.nio.ch.Net.bind0(Native Method)
at java.base/sun.nio.ch.Net.bind(Net.java:459)
at java.base/sun.nio.ch.Net.bind(Net.java:448)
at java.base/sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:227)
at java.base/sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:80)
at jdk.httpserver/sun.net.httpserver.ServerImpl.<init>(ServerImpl.java:142)
at jdk.httpserver/sun.net.httpserver.HttpServerImpl.<init>(HttpServerImpl.java:50)
at jdk.httpserver/sun.net.httpserver.DefaultHttpServerProvider.createHttpServer(DefaultHttpServerProvider.java:35)
at jdk.httpserver/com.sun.net.httpserver.HttpServer.create(HttpServer.java:137)
at io.prometheus.jmx.shaded.io.prometheus.client.exporter.HTTPServer$Builder.build(HTTPServer.java:365)
at io.prometheus.jmx.common.http.HTTPServerFactory.createHTTPServer(HTTPServerFactory.java:123)
at io.prometheus.jmx.JavaAgent.premain(JavaAgent.java:60)
... 6 more
*** java.lang.instrument ASSERTION FAILED ***: "result" with message agent load/premain call failed at ./src/java.instrument/share/native/libinstrument/JPLISAgent.c line: 422
/var/run/cloudera-scm-agent/process/18315-kafka-KAFKA_BROKER-kafka_broker_rolling_restart_pre_check/scripts/broker_rolling_restart_checker.sh: line 63: 24738 Aborted kafka-topics ${option} --describe --bootstrap-server "${BOOTSTRAP_SERVERS}" --command-config "${prop_file}" >> "${describe_partitions_output}"
As per my understanding, the error indicates that a port is already being used while doing the health checks of Kafka topics. I assume it is complaining about 6667 since there is not clear mention of which port is an issue. However, rolling restart without this configuration just works fine. So confused if it is really the failure of Kafka-topics health check?
Diagnosis / troubleshooting / findings:
- When we do a full restart of kafka brokers with this configuration in place, it works okay, only the rolling restart fails.
- We tried below Cloudera articles, however, no luck:
https://my.cloudera.com/knowledge/How-to-configure-JMX-access-for-Kafka-in-CDP?id=390204
Since we are on a non-SSL cluster, below was the configuration tried while following the first link:
-server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 -XX:+DisableExplicitGC -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.host=127.0.0.1 -Djava.rmi.server.hostname=127.0.0.1 -Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/kafka-configuring/topics/kafka-config-rolling...
Since we are on a non-SSL cluster, below was the configuration tried while following the second link for testing:
bootstrap.servers=<broker-1>:6667,<broker-2>:6667,<broker-3>:6667,<broker-4>:6667 security.protocol=SASL_PLAINTEXT ssl.client.auth=none sasl.mechanism=GSSAPI sasl.kerberos.service.name=kafka sasl.jaas.config=com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true keyTab="/var/run/cloudera-scm-agent/process/18315-kafka-KAFKA_BROKER-kafka_broker_rolling_restart_pre_check/kafka.keytab" principal="<KAFKA Principal>"; - Keytab location is hard coded just for 1 broker for testing and ensured that the keytab file exists at the location. We tried rolling restarting this broker, however, it failed.
Without the above JMX configuration in Advanced configuration snippet, rolling restart just works fine.
Unsure on what exactly have been configured incorrectly or what is missed. Request to please share your thoughts / recommendations / suggestions on the same.
Thanks
snm1523
Created 02-03-2025 01:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
For the benefit of anyone if looking for a solution to this. We need to add the -javaagent parameter in "Additional Broker Java Options" configuration property instead of what mentioned in in the post.
