Created 02-07-2018 09:57 AM
Hi,
On my HDP 2.6.3 cluster, I'm trying to decommission a HBase region server from Ambari (V2.5.2), but I get an error :
resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/fr0-datalab-p23.bdata.corp@BDATA.CORP; /usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add fr0-datalab-p53.bdata.corp' returned 1. Error: Could not find or load main class org.jruby.Main
After a quick analysis, I realized that jruby jar file has been moved to $HBASE_HOME/lib/ruby/ folder (it was located in $HBASE_HOME/lib/ folder in HDP 2.5.3 release).
When trying to figure out how to fix this issue, I understood that hbase script invokes hbase.distro script that is supposed to properly add jruby jar file to the classpath "only when required", meaning only when it receives "shell" or "org.jruby.Main" as $1 argument (after --config one).
When debugguing its execution, I could see that it considers $1 as "-Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf" ... and does not add jruby jar file to the classpath...
If I try to remove -Djava... arg and I manually execute "/usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add fr0-datalab-p39.bdata.corp", it seems to work properly (at least I cannot see any error in the terminal)
My question is pretty simple : what is the best way to fix this problem :
Thanks for your advices
Sebastien
Created 04-20-2018 10:36 PM
Hi Sebastian,
Looks like may be running into the issue "Decommission RegionServer fails when kerberos is enabled" - https://issues.apache.org/jira/browse/AMBARI-22918.
I'm on Ambari 2.5.1 and ran into this yesterday. Since I can't upgrade to a version of Ambari where this has been fixed, I patched manually.
There were no changes to the file between 2.5.1 and 2.5.2 so the below commands should work for you. Please verify the path to hbase_decommission.py in your environment and make a backup of hbase_decomission.py just in case. Make sure the backup is outside of /var/lib/ambari-server/resources/ or you'll have to clean it up from all of your nodes.
Roll out patch:
curl -o /var/lib/ambari-server/resources/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py 'https://raw.githubusercontent.com/brfrn169/ambari/1a7237e5396114afd45cf55ee8c942d3037fbc3d/ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py'
Roll Back:
curl -o /var/lib/ambari-server/resources/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py 'https://raw.githubusercontent.com/apache/ambari/release-2.5.1/ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py'
Thanks,
Alan
Created 09-14-2020 02:26 AM
Hello
Have the same error and are hoping for a response to the question at hand.
Getting error when trying to decom a RS.
stderr: /var/lib/ambari-agent/data/errors-3752.txt
Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", line 111, in <module> HbaseMaster().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 375, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", line 55, in decommission hbase_decommission(env) File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", line 84, in hbase_decommission logoutput=True File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run tries=self.resource.tries, try_sleep=self.resource.try_sleep) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call raise ExecutionFailed(err_msg, code, out, err) resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/sktp2hhdp1mn03.xxxx.xx@XXXX.XX; /usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add sktp2hhdp1nn05.xxxx.xx' returned 1. Error: Could not find or load main class org.jruby.Main
stdout: /var/lib/ambari-agent/data/output-3752.txt
2020-09-14 11:18:32,905 - Stack Feature Version Info: Cluster Stack=2.6, Command Stack=None, Command Version=2.6.4.0-91 -> 2.6.4.0-91 2020-09-14 11:18:32,914 - Using hadoop conf dir: /usr/hdp/2.6.4.0-91/hadoop/conf 2020-09-14 11:18:32,918 - checked_call['hostid'] {} 2020-09-14 11:18:32,933 - checked_call returned (0, 'd70a5240') 2020-09-14 11:18:32,941 - File['/usr/hdp/current/hbase-master/bin/draining_servers.rb'] {'content': StaticFile('draining_servers.rb'), 'mode': 0755} 2020-09-14 11:18:32,943 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/sktp2hhdp1mn03.xxxx.xx@XXXX.XX; /usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add sktp2hhdp1nn05.xxxx.xx'] {'logoutput': True, 'user': 'hbase'} Error: Could not find or load main class org.jruby.Main Command failed after 1 tries