Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

zookeeper监控异常 Failed to collect information on ZooKeeper

Highlighted

zookeeper监控异常 Failed to collect information on ZooKeeper

New Contributor

在zookeeper节点上监控图会出现空缺,经常出现ZooKeeper Canary检测失败及无法获取zookeeper  Quorum状态。

确认过zk节点的资源情况,cpu,网络io,磁盘io都是正常的负载很低,不存在资源瓶颈问题。

在zookeeper的日志中并无什么有用的信息(没有相关的ERROR或者WARN。)

在cloudera manager节点的SERVICEMONITOR日志发现了大量相关的错误。

部分日志如下:

 

2020-06-06 20:45:46,186 WARN com.cloudera.cmon.firehose.polling.zookeeper.ZooKeeperServiceStateFetcher: (5 skipped) Failed to collect information on ZooKeeper Server zookeeper-SERVER-03370f14c890ce3418af85390ea3dff1
java.rmi.UnmarshalException: Error unmarshaling return header; nested exception is: 
	java.net.SocketTimeoutException: Read timed out
	at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:236)
	at sun.rmi.server.UnicastRef.invoke(UnicastRef.java:161)
	at com.sun.jmx.remote.internal.PRef.invoke(Unknown Source)
	at javax.management.remote.rmi.RMIConnectionImpl_Stub.getAttributes(Unknown Source)
	at javax.management.remote.rmi.RMIConnector$RemoteMBeanServerConnection.getAttributes(RMIConnector.java:928)
	at com.cloudera.cmf.cdhclient.common.MemoryMXBeanWrapper.create(MemoryMXBeanWrapper.java:92)
	at com.cloudera.enterprise.JmxUtil.getMemoryMXBeanWrapper(JmxUtil.java:258)
	at com.cloudera.cmon.firehose.polling.zookeeper.ZooKeeperServiceStateFetcher.getMetricsProviders(ZooKeeperServiceStateFetcher.java:318)
	at com.cloudera.cmon.firehose.polling.zookeeper.ZooKeeperServiceStateFetcher.getZooKeeperServerInfo(ZooKeeperServiceStateFetcher.java:246)
	at com.cloudera.cmon.firehose.polling.zookeeper.ZooKeeperServiceStateFetcher.doWork(ZooKeeperServiceStateFetcher.java:177)
	at com.cloudera.cmon.firehose.polling.zookeeper.ZooKeeperServiceStateFetcher.doWork(ZooKeeperServiceStateFetcher.java:61)
	at com.cloudera.cmon.firehose.polling.CdhTask$InstrumentedWork.doWork(CdhTask.java:230)
	at com.cloudera.cmf.cdhclient.CdhExecutor$1.call(CdhExecutor.java:125)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method)
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
	at java.net.SocketInputStream.read(SocketInputStream.java:171)
	at java.net.SocketInputStream.read(SocketInputStream.java:141)
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
	at java.io.BufferedInputStream.read(BufferedInputStream.java:265)
	at java.io.DataInputStream.readByte(DataInputStream.java:265)
	at sun.rmi.transport.StreamRemoteCall.executeCall(StreamRemoteCall.java:222)
	... 16 more

 

 

也怀疑过是网络问题造成的timeout,但其他服务的监控等都是正常的并没有出现这种情况。

目前还没找到解决的方法,希望能够得到大家的帮助。

1 REPLY 1

Re: zookeeper监控异常 Failed to collect information on ZooKeeper

Community Manager

English Translation of the post: 

On the zookeeper node, the monitoring graph will be vacant, and ZooKeeper Canary detection often fails and the zookeeper Quorum status cannot be obtained.

Confirmed the resource situation of the zk node. CPU, network io, and disk io are all under normal load. There is no resource bottleneck.

There is no useful information in the zookeeper log (there is no related ERROR or WARN.)

A large number of related errors were found in the SERVICEMONITOR log of the cloudera manager node.

Some logs are as follows:

(error log)

 

I also suspected that the timeout was caused by a network problem, but the monitoring of other services is normal and this has not happened. I haven't found a solution yet, I hope to get your help.

 


Vidya Sargur, Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

Learn more about the Cloudera Community:

Don't have an account?
Coming from Hortonworks? Activate your account here