Created 12-13-2018 10:36 AM
I'm using HDP 2.6.1.0-129 cluster with Cloudbreak Deployer 2.4.2. It had worked well until Dec 7th, 2018. Since then, however, it always fails to add node(s) to the cluster using the Cloudbreak Deployer.
Following is the relevant error message in cbreak.log of the Cloudbreak Deployer:
cloudbreak_1 | 2018-12-13 06:12:11,102 [reactorDispatcher-34] buildLogContextForReactorHandler:69 INFO c.s.c.l.LogContextAspects - [owner:9e997395-c6d1-498d-bfa2-1a0f508c7b21] [type:CLUSTER] [id:2] [name:nakagawa-test-2] [flow:ec31867c-978c-41ef-8796-896ecabd98ba] [tracking:] A Reactor event handler's 'accept' method has been intercepted: execution(Flow2Handler.accept(..)), MDC logger context is built. cloudbreak_1 | 2018-12-13 06:12:11,116 [reactorDispatcher-34] execute:87 INFO c.s.c.c.f.AbstractAction - [owner:9e997395-c6d1-498d-bfa2-1a0f508c7b21] [type:STACKVIEW] [id:2] [name:nakagawa-test-2] [flow:ec31867c-978c-41ef-8796-896ecabd98ba] [tracking:] Stack: 2, flow state: UPSCALING_AMBARI_STATE, phase: service, execution time 1024 sec cloudbreak_1 | 2018-12-13 06:12:11,117 [reactorDispatcher-34] clusterUpscaleFailed:88 ERROR c.s.c.c.f.c.u.ClusterUpscaleFlowService - [owner:9e997395-c6d1-498d-bfa2-1a0f508c7b21] [type:STACKVIEW] [id:2] [name:nakagawa-test-2] [flow:ec31867c-978c-41ef-8796-896ecabd98ba] [tracking:] Error during Cluster upscale flow: com.sequenceiq.cloudbreak.orchestrator.exception.CloudbreakOrchestratorFailedException: Failed: Orchestrator component went failed in 7.500000 mins, message: There are missing nodes from job (jid: 20181213061135612084), target: [ip-10-0-1-203.ap-northeast-1.compute.internal, ip-10-0-1-68.ap-northeast-1.compute.internal] cloudbreak_1 | Node: ip-10-0-1-203.ap-northeast-1.compute.internal Error(s): An exception occurred in this state: Traceback (most recent call last): cloudbreak_1 | File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1843, in call cloudbreak_1 | **cdata['kwargs']) cloudbreak_1 | File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1795, in wrapper cloudbreak_1 | return f(*args, **kwargs) cloudbreak_1 | File "/usr/lib/python2.7/dist-packages/salt/states/pkg.py", line 1631, in installed cloudbreak_1 | **kwargs) cloudbreak_1 | File "/usr/lib/python2.7/dist-packages/salt/modules/yumpkg.py", line 1415, in install cloudbreak_1 | if re.match('kernel(-.+)?', name): cloudbreak_1 | File "/usr/lib64/python2.7/re.py", line 141, in match cloudbreak_1 | return _compile(pattern, flags).match(string) cloudbreak_1 | TypeError: expected string or buffer cloudbreak_1 | | An exception occurred in this state: Traceback (most recent call last): cloudbreak_1 | File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1843, in call cloudbreak_1 | **cdata['kwargs']) cloudbreak_1 | File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1795, in wrapper cloudbreak_1 | return f(*args, **kwargs) cloudbreak_1 | File "/usr/lib/python2.7/dist-packages/salt/states/pkg.py", line 1631, in installed cloudbreak_1 | **kwargs) cloudbreak_1 | File "/usr/lib/python2.7/dist-packages/salt/modules/yumpkg.py", line 1415, in install cloudbreak_1 | if re.match('kernel(-.+)?', name): cloudbreak_1 | File "/usr/lib64/python2.7/re.py", line 141, in match cloudbreak_1 | return _compile(pattern, flags).match(string) cloudbreak_1 | TypeError: expected string or buffe
I also found the relevant error message in /var/log/salt/minion of the node being tried to add:
2018-12-13 05:36:43,749 [salt.state ][INFO ][8299] Running state [/etc/yum.repos.d/ambari.repo] at time 05:36:43.749534 2018-12-13 05:36:43,749 [salt.state ][INFO ][8299] Executing state file.managed for [/etc/yum.repos.d/ambari.repo] 2018-12-13 05:36:43,756 [salt.fileclient ][DEBUG ][8299] In saltenv 'base', looking at rel_path 'ambari/yum/ambari.repo' to resolve 'salt://ambari/yum/ambari.repo' 2018-12-13 05:36:43,757 [salt.fileclient ][DEBUG ][8299] In saltenv 'base', ** considering ** path '/var/cache/salt/minion/files/base/ambari/yum/ambari.repo' to resolve 'salt://ambari/yum/ambari.repo' 2018-12-13 05:36:43,757 [salt.utils.jinja ][DEBUG ][8299] Jinja search path: ['/var/cache/salt/minion/files/base'] 2018-12-13 05:36:43,761 [salt.state ][INFO ][8299] File /etc/yum.repos.d/ambari.repo is in the correct state 2018-12-13 05:36:43,761 [salt.state ][INFO ][8299] Completed state [/etc/yum.repos.d/ambari.repo] at time 05:36:43.761816 duration_in_ms=12.282 2018-12-13 05:36:43,768 [salt.utils.lazy ][DEBUG ][8299] Could not LazyLoad pkg.ex_mod_init: 'pkg.ex_mod_init' is not available. 2018-12-13 05:36:43,768 [salt.state ][INFO ][8299] Running state [ambari-agent] at time 05:36:43.768609 2018-12-13 05:36:43,768 [salt.state ][INFO ][8299] Executing state pkg.installed for [ambari-agent] 2018-12-13 05:36:43,769 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['rpm', '-qa', '--queryformat', '%{NAME}_|-%{EPOCH}_|-%{VERSION}_|-%{RELEASE}_|-%{ARCH}_|-(none)'] in directory '/root' 2018-12-13 05:36:44,277 [salt.utils.lazy ][DEBUG ][8299] Could not LazyLoad pkg.check_db: 'pkg.check_db' is not available. 2018-12-13 05:36:44,284 [salt.utils.lazy ][DEBUG ][8299] Could not LazyLoad pkg.check_extra_requirements: 'pkg.check_extra_requirements' is not available. 2018-12-13 05:36:44,290 [salt.utils.lazy ][DEBUG ][8299] Could not LazyLoad pkg.version_clean: 'pkg.version_clean' is not available. 2018-12-13 05:36:44,290 [salt.loaded.int.module.rpm ][WARNING ][8299] rpmdevtools is not installed, please install it for more accurate version comparisons 2018-12-13 05:36:44,291 [salt.loaded.int.states.pkg ][DEBUG ][8299] Current version (['2.6.2.0-155']) did not match desired version specification (2.6.2.0), adding to installation targets 2018-12-13 05:36:44,291 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['yum', '--quiet', 'clean', 'expire-cache'] in directory '/root' 2018-12-13 05:36:44,435 [salt.loaded.int.module.cmdmod ][DEBUG ][8299] output: 2018-12-13 05:36:44,436 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['yum', '--quiet', 'check-update'] in directory '/root' 2018-12-13 05:36:46,398 [salt.loaded.int.module.yumpkg ][DEBUG ][8299] Searching for repos in ['/etc/yum/repos.d', '/etc/yum.repos.d'] 2018-12-13 05:36:46,401 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['yum', '--version'] in directory '/root' 2018-12-13 05:36:46,518 [salt.loaded.int.module.cmdmod ][DEBUG ][8299] output: 3.4.3 Installed: rpm-4.11.3-21.75.amzn1.x86_64 at 2017-11-20 22:11 Built : Amazon.com, Inc. <http://aws.amazon.com> at 2017-03-20 00:58 Committed: Amazon Linux AMI <amazon-linux-ami@amazon.com> at 2016-11-04 Installed: yum-3.4.3-150.70.amzn1.noarch at 2017-11-20 22:11 Built : Amazon.com, Inc. <http://aws.amazon.com> at 2017-08-10 23:50 Committed: Heath Petty <hpetty@amazon.com> at 2017-08-10 2018-12-13 05:36:46,519 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['yum', '--quiet', 'repository-packages', 'AMBARI.2.6.2.0', 'list', '--showduplicates'] in directory '/root' 2018-12-13 05:36:46,988 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['yum', '--quiet', 'repository-packages', 'saltstack-amzn-repo', 'list', '--showduplicates'] in directory '/root' 2018-12-13 05:36:47,470 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['yum', '--quiet', 'repository-packages', 'amzn-main', 'list', '--showduplicates'] in directory '/root' 2018-12-13 05:36:50,893 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['yum', '--quiet', 'repository-packages', 'HDP-2.6-repo-1', 'list', '--showduplicates'] in directory '/root' 2018-12-13 05:36:51,478 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['yum', '--quiet', 'repository-packages', 'amzn-updates', 'list', '--showduplicates'] in directory '/root' 2018-12-13 05:36:52,686 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['yum', '--quiet', 'repository-packages', 'HDP-UTILS', 'list', '--showduplicates'] in directory '/root' 2018-12-13 05:36:52,772 [salt.minion ][INFO ][3466] User root Executing command saltutil.running with jid 20181213053652757085 2018-12-13 05:36:52,772 [salt.minion ][DEBUG ][3466] Command details {'tgt_type': 'glob', 'jid': '20181213053652757085', 'tgt': '*', 'ret': '', 'user': 'root', 'arg': [], 'fun': 'saltutil.running'} 2018-12-13 05:36:52,781 [salt.minion ][INFO ][8478] Starting a new job with PID 8478 2018-12-13 05:36:52,795 [salt.utils.lazy ][DEBUG ][8478] LazyLoaded saltutil.running 2018-12-13 05:36:52,796 [salt.utils.lazy ][DEBUG ][8478] LazyLoaded direct_call.get 2018-12-13 05:36:52,797 [salt.minion ][DEBUG ][8478] Minion return retry timer set to 9 seconds (randomized) 2018-12-13 05:36:52,797 [salt.minion ][INFO ][8478] Returning information for job: 20181213053652757085 2018-12-13 05:36:52,798 [salt.transport.zeromq ][DEBUG ][8478] Initializing new AsyncZeroMQReqChannel for ('/etc/salt/pki/minion', 'ip-10-0-1-68.ap-northeast-1.compute.internal', 'tcp://10.0.1.203:4506', 'aes') 2018-12-13 05:36:52,798 [salt.crypt ][DEBUG ][8478] Initializing new AsyncAuth for ('/etc/salt/pki/minion', 'ip-10-0-1-68.ap-northeast-1.compute.internal', 'tcp://10.0.1.203:4506') 2018-12-13 05:36:52,805 [salt.minion ][DEBUG ][8478] minion return: {'fun_args': [], 'jid': '20181213053652757085', 'return': [{'tgt_type': 'glob', 'jid': '20181213053640897161', 'tgt': '*', 'pid': 8299, 'ret': '', 'user': 'saltuser', 'arg': [], 'fun': 'state.highstate'}], 'retcode': 0, 'success': True, 'fun': 'saltutil.running'} 2018-12-13 05:36:53,152 [salt.loaded.int.module.cmdmod ][INFO ][8299] Executing command ['yum', '--quiet', 'repository-packages', 'HDP-UTILS-1.1.0.21-repo-1', 'list', '--showduplicates'] in directory '/root' 2018-12-13 05:36:53,778 [salt.loaded.int.module.rpm ][WARNING ][8299] rpmdevtools is not installed, please install it for more accurate version comparisons 2018-12-13 05:36:53,953 [salt.state ][ERROR ][8299] An exception occurred in this state: Traceback (most recent call last): File "/usr/lib/python2.7/dist-packages/salt/state.py", line 1843, in call **cdata['kwargs']) File "/usr/lib/python2.7/dist-packages/salt/loader.py", line 1795, in wrapper return f(*args, **kwargs) File "/usr/lib/python2.7/dist-packages/salt/states/pkg.py", line 1631, in installed **kwargs) File "/usr/lib/python2.7/dist-packages/salt/modules/yumpkg.py", line 1415, in install if re.match('kernel(-.+)?', name): File "/usr/lib64/python2.7/re.py", line 141, in match return _compile(pattern, flags).match(string) TypeError: expected string or buffer
Can you please help me find the solution or workaround?
Thank you in advance.
Created 12-13-2018 02:09 PM
The Ambari version information seems to have changed. In Cloudbreak version 2.4.2 there is no way to update this data through API calls. You have to update it manually in the database instead:
UPDATE clustercomponent SET attributes = '{"predefined":false,"version":"2.6.2.0-155","baseUrl":"http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.6.2.0","gpgKeyUrl":"http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins"}' WHERE componenttype = ‘AMBARI_REPO_DETAILS’ AND cluster_id = <id_of_the_cluster>
The above attribute value is just an example. Please replace it with the one in you database and edit the version field in the json to include the build number: "2.6.2.0-155"
Created 12-13-2018 02:09 PM
The Ambari version information seems to have changed. In Cloudbreak version 2.4.2 there is no way to update this data through API calls. You have to update it manually in the database instead:
UPDATE clustercomponent SET attributes = '{"predefined":false,"version":"2.6.2.0-155","baseUrl":"http://public-repo-1.hortonworks.com/ambari/centos6/2.x/updates/2.6.2.0","gpgKeyUrl":"http://public-repo-1.hortonworks.com/ambari/centos6/RPM-GPG-KEY/RPM-GPG-KEY-Jenkins"}' WHERE componenttype = ‘AMBARI_REPO_DETAILS’ AND cluster_id = <id_of_the_cluster>
The above attribute value is just an example. Please replace it with the one in you database and edit the version field in the json to include the build number: "2.6.2.0-155"
Created 12-19-2018 01:08 AM
Thank you for your quick response. The workaround worked for my environment.