Created 10-10-2018 09:21 PM
Ambari version = 2.6.6
Attempted a rolling upgrade frpm HDP 2.5 to 2.6 which was aborted during the initial step, but, after that Amabri metrics got into an odd state.
Ambari server database check says the following on start up:
WARN - You have non selected configs: ams-hbase-security-site,ams-grafana-env,ams-ssl-client,ams-ssl-server,ams-hbase-policy,ams-hbase-site for service AMBARI_METRICS from cluster XXXXXX
Can't start any of the ambari metrics components like metrics collector or grafana or monitor, they all show similar stack trace in the ambari agent log:
How do I recover from this?
Thank you
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_grafana.py", line 79, in <module> AmsGrafana().execute() File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/script.py", line 375, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_grafana.py", line 42, in start import params File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/params.py", line 29, in <module> import status_params File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/status_params.py", line 27, in <module> from params_linux import * File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/params_linux.py", line 68, in <module> grafana_pid_file = format("{ams_grafana_pid_dir}/grafana-server.pid") File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/format.py", line 95, in format return ConfigurationFormatter().format(format_string, args, **result) File "/usr/lib/ambari-agent/lib/resource_management/libraries/functions/format.py", line 59, in format result_protected = self.vformat(format_string, args, all_params) File "/usr/lib64/python2.6/string.py", line 549, in vformat result = self._vformat(format_string, args, kwargs, used_args, 2) File "/usr/lib64/python2.6/string.py", line 582, in _vformat result.append(self.format_field(obj, format_spec)) File "/usr/lib64/python2.6/string.py", line 599, in format_field return format(value, format_spec) File "/usr/lib/ambari-agent/lib/resource_management/libraries/script/config_dictionary.py", line 73, in __getattr__ raise Fail("Configuration parameter '" + self.name + "' was not found in configurations dictionary!") resource_management.core.exceptions.Fail: Configuration parameter 'ams-grafana-env' was not found in configurations dictionary!
Created 10-11-2018 06:45 AM
It looks like there are inconsistencies in the ambari database. We will likely have to take steps for each of these mentioned by your ambari database error that you have on startup;
ams-hbase-security-site,ams-grafana-env,ams-ssl-client,ams-ssl-server,ams-hbase-policy,ams-hbase-site
First make sure you have a backup of the ambari database. Once we have a backup, we can connect to the ambari database (default password 'bigdata') and set these components back to the selected state. Ex.
# update clusterconfig set selected=1 where type_name='ams-hbase-security-site'; # update clusterconfig set selected=1 where type_name='ams-grafana-env'; etc
Created 10-11-2018 06:32 PM
You need to select only the latest config, so the above update is dangerous if more than one row in clusterconfig. You should be able to sort by create_timestamp and choose the latest config_id
Alternatively, you can delete and re-add the service if the above seems complicated to do. The re-add should get you in clean state. Make sure to take database backup at every stage. The metrics data can be preserved by taking a backup of hbase.rootdir in ams-hbase-site configuration and after re-add you can stop AMS replace the HBase data and restart.
Created 10-21-2018 10:08 PM
Thank you very much for all the responses, both of them helped in getting it straightened out. Selected the latest config by timestamp and config_id and selected that. I am now able to restart ambari-metrics services:
select config_id, version,type_name,unmapped,selected,selected_timestamp from clusterconfig where type_name='ams-hbase-policy' order by selected_timestamp desc;
update clusterconfig set selected=1 where type_name='ams-hbase-site' and config_id=xxx;
Uma.
,Thank you very much for the responses. Updated the cluster config by selecting the latest config by choosing the latest config by timestamp and config_id:
select config_id, version,type_name,unmapped,selected,selected_timestamp from clusterconfig where type_name='ams-hbase-site' order by selected_timestamp desc;
update clusterconfig set selected=1 where type_name='ams-hbase-site' and config_id=xxx;
That fixed the problem.
Uma