Support Questions

Find answers, ask questions, and share your expertise

ambari monitor failed to start

avatar

ambari metrics monitor failed to start on one my 2 hosts. here is the error

stderr: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 58, in <module> AmsMonitor().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py", line 28, in install self.install_packages(env, exclude_packages = ['ambari-metrics-collector', 'ambari-metrics-grafana']) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 410, in install_packages retry_count=agent_stack_retry_count) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 54, in action_install self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 49, in install_package self.checked_call_with_retries(cmd, sudo=True, logoutput=self.get_logoutput()) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 83, in checked_call_with_retries return self._call_with_retries(cmd, is_checked=True, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 91, in _call_with_retries code, out = func(cmd, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-monitor' returned 1. Error: Nothing to do stdout: 2018-04-03 05:48:24,805 - The hadoop conf dir /usr/hdp/current/hadoop-client/conf exists, will call conf-select on it for version 2.3.6.0-3796 2018-04-03 05:48:24,805 - Checking if need to create versioned conf dir /etc/hadoop/2.3.6.0-3796/0 2018-04-03 05:48:24,805 - call['conf-select create-conf-dir --package hadoop --stack-version 2.3.6.0-3796 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False, 'stderr': -1} 2018-04-03 05:48:24,837 - call returned (1, '/etc/hadoop/2.3.6.0-3796/0 exist already', '') 2018-04-03 05:48:24,837 - checked_call['conf-select set-conf-dir --package hadoop --stack-version 2.3.6.0-3796 --conf-version 0'] {'logoutput': False, 'sudo': True, 'quiet': False} 2018-04-03 05:48:24,868 - checked_call returned (0, '') 2018-04-03 05:48:24,868 - Ensuring that hadoop has the correct symlink structure 2018-04-03 05:48:24,869 - Using hadoop conf dir: /usr/hdp/current/hadoop-client/conf 2018-04-03 05:48:24,871 - Group['hadoop'] {} 2018-04-03 05:48:24,873 - Group['users'] {} 2018-04-03 05:48:24,873 - User['hive'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,875 - User['zookeeper'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,876 - User['ams'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,877 - User['oozie'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']} 2018-04-03 05:48:24,878 - User['ambari-qa'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']} 2018-04-03 05:48:24,879 - User['tez'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['users']} 2018-04-03 05:48:24,880 - User['hdfs'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,881 - User['sqoop'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,882 - User['yarn'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,883 - User['hcat'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,884 - User['mapred'] {'gid': 'hadoop', 'fetch_nonlocal_groups': True, 'groups': ['hadoop']} 2018-04-03 05:48:24,885 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555} 2018-04-03 05:48:24,888 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'} 2018-04-03 05:48:24,893 - Skipping Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if 2018-04-03 05:48:24,894 - Group['hdfs'] {} 2018-04-03 05:48:24,894 - User['hdfs'] {'fetch_nonlocal_groups': True, 'groups': ['hadoop', 'hdfs']} 2018-04-03 05:48:24,895 - FS Type: 2018-04-03 05:48:24,895 - Directory['/etc/hadoop'] {'mode': 0755} 2018-04-03 05:48:24,919 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'} 2018-04-03 05:48:24,920 - Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 'group': 'hadoop', 'mode': 0777} 2018-04-03 05:48:24,937 - Repository['HDP-2.3'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.6.0', 'action': ['create'], 'components': ['HDP', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP', 'mirror_list': None} 2018-04-03 05:48:24,949 - File['/etc/yum.repos.d/HDP.repo'] {'content': '[HDP-2.3]\nname=HDP-2.3\nbaseurl=http://public-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.6.0\n\npath=/\nenabled=1\ngpgcheck=0'} 2018-04-03 05:48:24,950 - Repository['HDP-UTILS-1.1.0.20'] {'base_url': 'http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos6', 'action': ['create'], 'components': ['HDP-UTILS', 'main'], 'repo_template': '[{{repo_id}}]\nname={{repo_id}}\n{% if mirror_list %}mirrorlist={{mirror_list}}{% else %}baseurl={{base_url}}{% endif %}\n\npath=/\nenabled=1\ngpgcheck=0', 'repo_file_name': 'HDP-UTILS', 'mirror_list': None} 2018-04-03 05:48:24,955 - File['/etc/yum.repos.d/HDP-UTILS.repo'] {'content': '[HDP-UTILS-1.1.0.20]\nname=HDP-UTILS-1.1.0.20\nbaseurl=http://public-repo-1.hortonworks.com/HDP-UTILS-1.1.0.20/repos/centos6\n\npath=/\nenabled=1\ngpgcheck=0'} 2018-04-03 05:48:24,956 - Package['unzip'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-04-03 05:48:25,089 - Skipping installation of existing package unzip 2018-04-03 05:48:25,089 - Package['curl'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-04-03 05:48:25,109 - Skipping installation of existing package curl 2018-04-03 05:48:25,110 - Package['hdp-select'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-04-03 05:48:25,130 - Skipping installation of existing package hdp-select 2018-04-03 05:48:25,355 - Package['ambari-metrics-monitor'] {'retry_on_repo_unavailability': False, 'retry_count': 5} 2018-04-03 05:48:25,488 - Installing package ambari-metrics-monitor ('/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-monitor')

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Krupal Jagtap

The problem is this: (Ambari Version and the Ambari Metrics Component version should be same)

# rpm -qa | grep ambari
 ambari-metrics-collector-2.1.0-1470.x86_64
 ambari-metrics-monitor-2.1.0-1470.x86_64
 ambari-server-2.2.2.0-460.x86_64 
 ambari-metrics-hadoop-sink-2.1.0-1470.x86_64 
 ambari-agent-2.2.2.0-460.x86_64

.

Looks like you have not performed the Amabri Post upgrade steps hence your AMS binaries are still Old (2.1.0). where as ambari binaries are 2.2.2

1. Stop the AMS collector Service from ambari UI and then perform the AMS post upgrade steps: https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_upgrading_Ambari/content/_upgrade_ambari...

Please verify that you have the correct ambari.repo: (repo should be from 2.2.2 version NOT from 2.1.0)

# cat /etc/yum.repos.d/ambari.repo | grep 2.2.2

2. So please do this on all the hosts where you see AMS binary version as 2.1.0

# yum clean all
# yum upgrade ambari-metrics-monitor ambari-metrics-hadoop-sink

.

3. And on the host where Ambari Metrics Collector is installed Please do this:

# yum upgrade ambari-metrics-collector

.

View solution in original post

21 REPLIES 21

avatar

One other thing , The data for metrics is unavailable for only one master host. For the other host the graphs are visible

avatar
Master Mentor

@Krupal Jagtap

You can use the Ambari API approach to delete the AMS monitor component.

Example:

# curl  -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://hdfcluster1.example.com:8080/api/v1/clusters/TestCluster/hosts/hdfcluster3.example.com/host_c...

.


Here "hdfcluster3.example.com" is the host from where we want to delete the Metrics Monitor.
"hdfcluster1.example.com:8080" is ambari server host & port.
"TestCluster" is the cluster name.

Please replace these values accordingly.

Once it is deleted .. it is better to clean the "/etc/ambari-metrics-monitor" directory from that host. And it is also better to take a backup of the mentioned directory and then move it.

# mv /etc/ambari-metrics-monitor  /etc/ambari-metrics-monitor_BKP
# mv /usr/lib/python2.6/site-packages/resource_monitoring /usr/lib/python2.6/site-packages/resource_monitoring_BKP

.

Then freshly reinstall the monitor from ambari UI on this host.

avatar

HI

I tried the curl command for removing it

it returned the following error

{ "status" : 500, "message" : "org.apache.ambari.server.controller.spi.SystemException: An inter nal system exception occurred: Host Component cannot be removed, clusterName=Qc, serviceName=AMBARI_METRICS, componentName=METRICS_MONITOR, hostname= itxcqchdp01.catmdev.com, request={ clusterName=CardtronicsQc, serviceName=AMBARI _METRICS, componentName=METRICS_MONITOR, hostname=itxcqchdp01.catmdev.com, desir edState=null, state=null, desiredStackId=null, staleConfig=null, adminState=null }"

The problem seems to be on the main host itself

avatar
Master Mentor

@Krupal Jagtap

We will have to look at the complete "ambari-server.log" to understand what might be the issue.

However looks like it might be due to Database Inconsistency.

Take a fresh Amabri DB dump (as a backup).

Is it possible to just stop the "Ambari Metrics" service itself and then Delete the whole "Ambari Metrics Service" from ambari UI and then reinstall it to quickly attempt to find the fix. (If you do not have much AMS data then deleting this service and installing it might be a better option)


avatar

Yep of course. I do not have much ams data and I think to delete it completely and reinstall it will be the better option. Not sure how to do it though

avatar
Master Mentor

@Krupal Jagtap

Login to Ambari UI and the navigate to Amabri Dashboard left panel

Ambari Metrics --> "Service Actions" (Drop down menu) --> Stop
Ambari Metrics --> "Service Actions" (Drop down menu) --> Delete Service

.

avatar

Jay Kumar SenSharma there is no option of delete even after service is stopped, Just restart and move

avatar

Jay Kumar SenSharma ♦ Hi jay . So I removed the service and added it again through teh ambari web ui. Still one of the hosts does not generate any metrics. other host displays the graphs and everything. What might be the problem here?

avatar

the ambari-metric-monitor has following log

2018-04-05 00:07:04,749 [INFO] controller.py:110 - Adding event to cache, : {u'metrics': [], u'collect_every': u'15'} 2018-04-05 00:07:04,749 [INFO] main.py:65 - Starting Server RPC Thread: /usr/lib/python2.6/site-packages/resource_monitoring/main.py start 2018-04-05 00:07:04,749 [INFO] controller.py:57 - Running Controller thread: Thread-1 2018-04-05 00:07:04,750 [INFO] emitter.py:45 - Running Emitter thread: Thread-2 2018-04-05 00:07:04,750 [INFO] emitter.py:65 - Nothing to emit, resume waiting. 2018-04-05 00:08:04,752 [WARNING] emitter.py:74 - Error sending metrics to server. 'NoneType' object has no attribute 'strip' 2018-04-05 00:08:04,752 [WARNING] emitter.py:80 - Retrying after 5 ...

avatar

And if tried from hdfs user , It does not return any output