Created 11-05-2015 04:47 AM
On a 6 node cluster, using Ambari 2.1.2/HDP 2.3.2.
Scenario 1:
When installing HDP and it goes to a point where it's installing the services across all nodes, it suddenly fails and it's due to Ambari Metrics/Monitors failing. Full stack trace below from Ambari UI.
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 131, in <module> AmsCollector().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 34, in install self.install_packages(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 395, in install_packages Package(name) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 45, in action_install self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 49, in install_package shell.checked_call(cmd, sudo=True, logoutput=self.get_logoutput()) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-collector' returned 1. Error: Nothing to do
Running the same command from the last line of the error yields the same response. There were no Ambari Metrics log generated. Ambari server log didn't any info.
Scenario 2:
I reset ambari-server and cleanup all hosts. Re-run ambari wizard and install all services except for Ambari Metrics. HDP installed successfully. I now added Ambari Metrics back and I'm getting the same error.
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 131, in <module> AmsCollector().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py", line 34, in install self.install_packages(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 395, in install_packages Package(name) File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 45, in action_install self.install_package(package_name, self.resource.use_repos, self.resource.skip_repos) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 49, in install_package shell.checked_call(cmd, sudo=True, logoutput=self.get_logoutput()) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call tries=tries, try_sleep=try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of '/usr/bin/yum -d 0 -e 0 -y install ambari-metrics-collector' returned 1. Error: Nothing to do
Created 11-06-2015 02:56 AM
Just got off webex with @rgarcia@hortonworks.com
We were able to remove the failed Ranger/Metrics installation using something like the below:
su postgres psql \c ambari DELETE FROM ambari.hostcomponentstate WHERE service_name IN ('RANGER'); DELETE FROM ambari.hostcomponentdesiredstate WHERE service_name IN ('RANGER'); DELETE FROM ambari.servicecomponentdesiredstate WHERE service_name IN ('RANGER'); DELETE FROM ambari.servicedesiredstate WHERE service_name IN ('RANGER'); DELETE FROM ambari.clusterservices WHERE service_name IN ('RANGER');
Then make sure to restart Ambari
service ambari-server restart
Now re-install Ranger/Ambair metrics
Created 11-05-2015 12:23 PM
@rgarcia@hortonworks.com Please see this
Also, I have added Jeff & Mahadev in this loop.
Created 11-05-2015 06:57 PM
Followed the recommended AH link from @Neeraj. See details below.
Logged in to postgresql:
ambari=> select * from hostcomponentstate where component_name LIKE '%RANGER%'; id | cluster_id | component_name | version | current_stack_id | current_state | host_id | service_name | upgrade_state | security_state-----+------------+-------------------+--------------+------------------+----------------+---------+--------------+---------------+---------------- 163 | 2 | RANGER_KMS_SERVER | 2.3.2.0-2950 | 4 | INSTALLED | 5 | RANGER_KMS | NONE | UNKNOWN 164 | 2 | RANGER_USERSYNC | UNKNOWN | 4 | INSTALL_FAILED | 4 | RANGER | NONE | UNKNOWN 165 | 2 | RANGER_ADMIN | UNKNOWN | 4 | INSTALL_FAILED | 4 | RANGER | NONE | UNKNOWN(3 rows)ambari=> select * from hostcomponentdesiredstate where component_name LIKE '%RANGER%'; cluster_id | component_name | desired_stack_id | desired_state | host_id | service_name | admin_state | maintenance_state | security_state | restart_required------------+----------------+------------------+---------------+---------+--------------+-------------+-------------------+----------------+------------------(0 rows)ambari=> select * from servicecomponentdesiredstate where component_name LIKE '%RANGER%'; component_name | cluster_id | desired_stack_id | desired_state | service_name-------------------+------------+------------------+---------------+-------------- RANGER_KMS_SERVER | 2 | 4 | INSTALLED | RANGER_KMS RANGER_ADMIN | 2 | 4 | INSTALLED | RANGER RANGER_USERSYNC | 2 | 4 | INSTALLED | RANGER (3 rows)
Delete in tables:
ambari=> delete from hostcomponentstate where component_name LIKE '%RANGER%';DELETE 3ambari=> delete from servicecomponentdesiredstate where component_name LIKE '%RANGER%'; DELETE 3
Then delete Services:
[root@great-wall02 ~]# curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://great-wall01.cloud.hortonworks.com:8080/api/v1/clusters/smesecurity/services/RANGER [root@great-wall02 ~]# curl -u admin:admin -H "X-Requested-By: ambari" -X DELETE http://great-wall01.cloud.hortonworks.com:8080/api/v1/clusters/smesecurity/services/RANGER_KMS
Deleted the databases for ranger: 'ranger', 'ranger_kms', 'ranger_audit'.
Tried reinstalling Ranger only but now getting the error below. Looks like there are some metadata in Ambari DB still that needs to be cleaned up. What ambari tables should I clean up?
05 Nov 2015 10:51:25,540 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-hbase-security-site 05 Nov 2015 10:51:25,541 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-hbase-site 05 Nov 2015 10:51:25,541 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-log4j 05 Nov 2015 10:51:25,542 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-site 05 Nov 2015 10:51:25,542 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-hbase-policy 05 Nov 2015 10:51:25,543 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-hbase-log4j 05 Nov 2015 10:51:25,543 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-env 05 Nov 2015 10:51:25,544 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ams-hbase-env 05 Nov 2015 10:51:25,544 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=kms-properties 05 Nov 2015 10:51:25,545 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ranger-kms-policymgr-ssl 05 Nov 2015 10:51:25,545 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=kms-log4j 05 Nov 2015 10:51:25,545 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ranger-kms-security 05 Nov 2015 10:51:25,546 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ranger-kms-audit 05 Nov 2015 10:51:25,546 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=dbks-site 05 Nov 2015 10:51:25,546 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=kms-env 05 Nov 2015 10:51:25,547 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=kms-site 05 Nov 2015 10:51:25,547 ERROR [qtp-client-26] ClusterImpl:2016 - Config inconsistency exists: unknown configType=ranger-kms-site
Created 11-05-2015 07:00 PM
Created 11-05-2015 07:50 PM
I recommend you select from each existing table in ambari, usually there are stale alerts, host configs also exist, you need to purge those as well. Don't just delete hostcomponentstate, hostcomponentdesiredstate, etc, go through all tables in ambari database and look for specific service. Sometimes you're lucky and only need to touch those desiredstate and componentstate tables but sometimes you need to clean up alerts, hostconfigs, etc.
Created 11-06-2015 02:56 AM
Just got off webex with @rgarcia@hortonworks.com
We were able to remove the failed Ranger/Metrics installation using something like the below:
su postgres psql \c ambari DELETE FROM ambari.hostcomponentstate WHERE service_name IN ('RANGER'); DELETE FROM ambari.hostcomponentdesiredstate WHERE service_name IN ('RANGER'); DELETE FROM ambari.servicecomponentdesiredstate WHERE service_name IN ('RANGER'); DELETE FROM ambari.servicedesiredstate WHERE service_name IN ('RANGER'); DELETE FROM ambari.clusterservices WHERE service_name IN ('RANGER');
Then make sure to restart Ambari
service ambari-server restart
Now re-install Ranger/Ambair metrics
Created 11-06-2015 11:33 AM
Created 02-11-2016 10:46 PM
I have accepted this as best answer to close the thread.
Created 02-09-2016 04:05 PM
What is the solution to this problem? I also encounter it, able to delete and re-install, but fail with same error.
Created 02-09-2016 04:58 PM
@rxu There are couple of things to check..make sure repo files are correct.
Whats the error?
Created 02-09-2016 05:09 PM
Same error. Repo checked, all looks good.