Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Lost HeartBeats Ambari

Highlighted

Lost HeartBeats Ambari

Hello,

I don't really know why but I lost all heartbeats on the main node of my cluster.

2813-heartbeats.png

Do you know how I can solve this problem ? I already try to reboot manually the node.

38 REPLIES 38
Highlighted

Re: Lost HeartBeats Ambari

Mentor

please make sure agent is up on the node.

ambari-agent status
Highlighted

Re: Lost HeartBeats Ambari

@Arthur GREVIN

Ambari Version please?

Can you give me the out put for below commands on the node where lost heartbeat?

ps -ef | grep kPT

ambar-agent status

Re: Lost HeartBeats Ambari

I have version2.1.1

ps -ef | grep kPT gives nothing : root 8364 8276 0 17:43 pts/1 00:00:00 grep kPT

ambari-agent status gives :

Found ambari-agent PID: 2123

ambari-agent running.

Agent PID at: /var/run/ambari-agent/ambari-agent.pid

Agent out at: /var/log/ambari-agent/ambari-agent.out

Agent log at: /var/log/ambari-agent/ambari-agent.log

Highlighted

Re: Lost HeartBeats Ambari

Mentor

please provide logs for agent and server

Highlighted

Re: Lost HeartBeats Ambari

Agent log :

INFO 2016-03-16 10:18:31,247 NetUtil.py:59 - Connecting to https://dl-master:8440/ca ERROR 2016-03-16 10:18:31,414 NetUtil.py:77 - [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)

ERROR 2016-03-16 10:18:31,414 NetUtil.py:78 - SSLError: Failed to connect. Please check openssl library versions. Refer to: https://bugzilla.redhat.com/show_bug.cgi?id=1022468 for more details.

WARNING 2016-03-16 10:18:31,417 NetUtil.py:105 - Server at https://bugzilla.redhat.com/show_bug.cgi?id=1022468 is not reachable, sleeping for 10 seconds...

WARNING 2016-03-16 10:18:31,417 NetUtil.py:105 - Server at https://bugzilla.redhat.com/show_bug.cgi?id=1022468 is not reachable, sleeping for 10 seconds...

Server log :

Mostly this :

16 Mar 2016 10:21:38,909 INFO [qtp-client-4711] MetricsPropertyProvider:518 - METRICS_COLLECTOR host is not live. Skip populating resources with metrics. 16 Mar 2016 10:21:38,910 INFO [qtp-client-4711] MetricsPropertyProvider:518 - METRICS_COLLECTOR host is not live. Skip populating resources with metrics.

Highlighted

Re: Lost HeartBeats Ambari

Here is the ambari-alert log :

Exception in thread "main" java.lang.RuntimeException: java.net.ConnectException: Call From dl-s01/10.0.0.5 to dl-master:8020 failed on connection exception: java.net.ConnectException: Connection refused

Caused by: java.net.ConnectException: Call From dl-s01/10.0.0.5 to dl-master:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) at org.apache.hadoop.ipc.Client.call(Client.java:1431) at org.apache.hadoop.ipc.Client.call(Client.java:1358) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:771) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2116) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305) at org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1301) at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1424) at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:596) at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:554) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508) ... 8 more

Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:612) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:710) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:373) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1493) at org.apache.hadoop.ipc.Client.call(Client.java:1397) ... 28 more)

Highlighted

Re: Lost HeartBeats Ambari

Mentor

Please check whether firewall is on on either machine, stop it or open ports for ambari and services to communicate

Highlighted

Re: Lost HeartBeats Ambari

Mentor

I See it says Ambari metrics collector is not live. Please check status of all metrics monitors and collector

Highlighted

Re: Lost HeartBeats Ambari

I don't know if that the status :

2841-metrics.png

what would be the next step to solve the problem ?

Don't have an account?
Coming from Hortonworks? Activate your account here