Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar

ENVIRONMENT: All Ambari versions prior to 2.4.x

SYMPTOMS: Intermittent loss of heartbeat to cluster nodes, freeze of ambari-agent service, intermittent issues in Ambari alerts and service status updates in Ambari dashboard.

Ambari-agent logs:-

INFO 2016-08-21 19:10:20,080 Heartbeat.py:78 - Building Heartbeat: {responseId = 139566, timestamp = 1471821020080, commandsInProgress = False, componentsMapped = True}ERROR 
2016-08-21 19:10:20,102 HostInfo.py:228 - Checking java processes failedTraceback (most recent call last):  File "/usr/lib/python2.6/site-packages/ambari_agent/HostInfo.py", line 211, in javaProcs    cmd = open(os.path.join('/proc', pid, 'cmdline'), 'rb').read()IOError: [Errno 2] No such file or directory: '/proc/24270/cmdline'

Top command output:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP TIME DATA COMMAND 10098 root 20 0 54.4g 53g 4540 S 54.5 14.0 18000:11 224 300,00 54g /usr/bin/python2 /usr/lib/python2.6/site-packages/ambari_agent/main.py start --expected-hostname=123.example.com

ROOT CAUSE: Race condition in subprocess python module. Due to this race condition, at some unlucky cases python garbage collection was disabled. This usually happened when running alerts, as a bunch of our alerts run shell commands and they do it in different threads. This is a known issue reported in AMBARI-17539.

SOLUTION: Upgrade to Ambari 2.4.x

WORKAROUND: Restart ambari-agent which would fix issue temporarily. Log a case with HWX support to get a patch for the bug fix.

1,335 Views
0 Kudos