Support Questions
Find answers, ask questions, and share your expertise

ambari monitor failed to start

ambari metrics monitor failed to start on one my 2 hosts. here is the error

stderr: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 58, in <module> AmsMonitor().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 28, in install self.install_packages(env, exclude_packages = ['ambari-metrics-collector', 'ambari-metrics-grafana']) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 410, in install_packages retry_count=agent_stack_retry_count) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 54, in action_install self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 49, in install_package self.checked_call_with_retries(cmd, sudo=True, logoutput=self.get_logoutput()) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 83, in checked_call_with_retries return self._call_with_retries(cmd, is_checked=True, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 91, in _call_with_retries code, out = func(cmd, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-monitor' returned 1. Error: Nothing to do stdout: 2018-04-03 05:48:24,805 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.3.6.0-3796 2018-04-03 05:48:24,805 - Checking if need to create versioned conf dir /etc/hadoop/2.3.6.0-3796/0 2018-04-03 05:48:24,805 - call['conf-select create-conf-dir --package hadoop --stack-version 2.3.6.0-3796 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1} 2018-04-03 05:48:24,837 - call returned (1, '/etc/hadoop/2.3.6.0-3796/0 exist already', '') 2018-04-03 05:48:24,837 - checked_call['conf-select set-conf-dir --package hadoop --stack-version 2.3.6.0-3796 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False} 2018-04-03 05:48:24,868 - checked_call returned (0, '') 2018-04-03 05:48:24,868 - Ensuring that hadoop has the correct symlink structure 2018-04-03 05:48:24,869 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2018-04-03 05:48:24,871 - Group['hadoop'] {} 2018-04-03 05:48:24,873 - Group['users'] {} 2018-04-03 05:48:24,873 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,875 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,876 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,877 - User['oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']} 2018-04-03 05:48:24,878 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']} 2018-04-03 05:48:24,879 - User['tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']} 2018-04-03 05:48:24,880 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,881 - User['sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,882 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,883 - User['hcat'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,884 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,885 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2018-04-03 05:48:24,888 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2018-04-03 05:48:24,893 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if 2018-04-03 05:48:24,894 - Group['hdfs'] {} 2018-04-03 05:48:24,894 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'hdfs']} 2018-04-03 05:48:24,895 - FS Type: 2018-04-03 05:48:24,895 - Directory['/etc/hadoop'] {'mode': 0755} 2018-04-03 05:48:24,919 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'} 2018-04-03 05:48:24,920 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0777} 2018-04-03 05:48:24,937 - Repository['HDP-2.3'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.6.0', 'action': ['create'], 'components': ['HDP', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP', 'mirror_list': None} 2018-04-03 05:48:24,949 - File['/etc/yum.repos.d/HDP.repo'] {'content': '[HDP-2.3]\nname=HDP-2.3\nbaseurl=http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.6.0\n\npath=/\nenabled=1\ngpgcheck=0'} 2018-04-03 05:48:24,950 - Repository['HDP-UTILS-1.1.0.20'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos6', 'action': ['create'], 'components': ['HDP-UTILS', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP-UTILS', 'mirror_list': None} 2018-04-03 05:48:24,955 - File['/etc/yum.repos.d/HDP-UTILS.repo'] {'content': '[HDP-UTILS-1.1.0.20]\nname=HDP-UTILS-1.1.0.20\nbaseurl=http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos6\n\npath=/\nenabled=1\ngpgcheck=0'} 2018-04-03 05:48:24,956 - Package['unzip'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-04-03 05:48:25,089 - Skipping installation of existing package unzip 2018-04-03 05:48:25,089 - Package['curl'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-04-03 05:48:25,109 - Skipping installation of existing package curl 2018-04-03 05:48:25,110 - Package['hdp-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-04-03 05:48:25,130 - Skipping installation of existing package hdp-select 2018-04-03 05:48:25,355 - Package['ambari-metrics-monitor'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-04-03 05:48:25,488 - Installing package ambari-metrics-monitor ('/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-monitor')

21 REPLIES 21

One other thing , The data for metrics is unavailable for only one master host. For the other host the graphs are visible

Super Mentor

@Krupal Jagtap

You can use the Ambari API approach to delete the AMS monitor component.

Example:

# curl  -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://hdfcluster1.example.com:8080/api/v1/clusters/TestCluster/hosts/hdfcluster3.example.com/host_c...

.


Here "hdfcluster3.example.com" is the host from where we want to delete the Metrics Monitor.
"hdfcluster1.example.com:8080" is ambari server host & port.
"TestCluster" is the cluster name.

Please replace these values accordingly.

Once it is deleted .. it is better to clean the "/etc/ambari-metrics-monitor" directory from that host. And it is also better to take a backup of the mentioned directory and then move it.

# mv /etc/ambari-metrics-monitor  /etc/ambari-metrics-monitor_BKP
# mv /usr/lib/python2.6/site-packages/resource_monitoring /usr/lib/python2.6/site-packages/resource_monitoring_BKP

.

Then freshly reinstall the monitor from ambari UI on this host.

HI

I tried the curl command for removing it

it returned the following error

{ "status" : 500, "message" : "org.apache.ambari.server.controller.spi.SystemException: An inter nal system exception occurred: Host Component cannot be removed, clusterName=Qc, serviceName=AMBARI_METRICS, componentName=METRICS_MONITOR, hostname= itxcqchdp01.catmdev.com, request={ clusterName=CardtronicsQc, serviceName=AMBARI _METRICS, componentName=METRICS_MONITOR, hostname=itxcqchdp01.catmdev.com, desir edState=null, state=null, desiredStackId=null, staleConfig=null, adminState=null }"

The problem seems to be on the main host itself

Super Mentor

@Krupal Jagtap

We will have to look at the complete "ambari-server.log" to understand what might be the issue.

However looks like it might be due to Database Inconsistency.

Take a fresh Amabri DB dump (as a backup).

Is it possible to just stop the "Ambari Metrics" service itself and then Delete the whole "Ambari Metrics Service" from ambari UI and then reinstall it to quickly attempt to find the fix. (If you do not have much AMS data then deleting this service and installing it might be a better option)


Yep of course. I do not have much ams data and I think to delete it completely and reinstall it will be the better option. Not sure how to do it though

Super Mentor

@Krupal Jagtap

Login to Ambari UI and the navigate to Amabri Dashboard left panel

Ambari Metrics --> "Service Actions" (Drop down menu) --> Stop
Ambari Metrics --> "Service Actions" (Drop down menu) --> Delete Service

.

Jay Kumar SenSharma there is no option of delete even after service is stopped, Just restart and move

Jay Kumar SenSharma ♦ Hi jay . So I removed the service and added it again through teh ambari web ui. Still one of the hosts does not generate any metrics. other host displays the graphs and everything. What might be the problem here?

the ambari-metric-monitor has following log

2018-04-05 00:07:04,749 [INFO] controller.py:110 - Adding event to cache, : {u'metrics': [], u'collect_every': u'15'} 2018-04-05 00:07:04,749 [INFO] main.py:65 - Starting Server RPC Thread: /usr/lib/python2.6/site-packages/resource_monitoring/main.py start 2018-04-05 00:07:04,749 [INFO] controller.py:57 - Running Controller thread: Thread-1 2018-04-05 00:07:04,750 [INFO] emitter.py:45 - Running Emitter thread: Thread-2 2018-04-05 00:07:04,750 [INFO] emitter.py:65 - Nothing to emit, resume waiting. 2018-04-05 00:08:04,752 [WARNING] emitter.py:74 - Error sending metrics to server. 'NoneType' object has no attribute 'strip' 2018-04-05 00:08:04,752 [WARNING] emitter.py:80 - Retrying after 5 ...

And if tried from hdfs user , It does not return any output