Support Questions
Find answers, ask questions, and share your expertise

HBase regionserver decommissionning from Ambari

Rising Star

Hi,

On my HDP 2.6.3 cluster, I'm trying to decommission a HBase region server from Ambari (V2.5.2), but I get an error :

resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/fr0-datalab-p23.bdata.corp@BDATA.CORP; /usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add fr0-datalab-p53.bdata.corp' returned 1. Error: Could not find or load main class org.jruby.Main 

After a quick analysis, I realized that jruby jar file has been moved to $HBASE_HOME/lib/ruby/ folder (it was located in $HBASE_HOME/lib/ folder in HDP 2.5.3 release).

When trying to figure out how to fix this issue, I understood that hbase script invokes hbase.distro script that is supposed to properly add jruby jar file to the classpath "only when required", meaning only when it receives "shell" or "org.jruby.Main" as $1 argument (after --config one).

When debugguing its execution, I could see that it considers $1 as "-Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf" ... and does not add jruby jar file to the classpath...

If I try to remove -Djava... arg and I manually execute "/usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add fr0-datalab-p39.bdata.corp", it seems to work properly (at least I cannot see any error in the terminal)

My question is pretty simple : what is the best way to fix this problem :

  • Either I try to change the way Ambary builds the command line to remove the -Djava option (but I'm not sure this will not break something else),
  • Or I update hbase script to systematically add jruby jar file (from its new location) to HBASE_CLASSPATH
  • Or I update hbase.distro not to consider $1 but instead $2 as the command to check in order to decide if jruby is required or not ...

Thanks for your advices

Sebastien

2 REPLIES 2

New Contributor

Hi Sebastian,

Looks like may be running into the issue "Decommission RegionServer fails when kerberos is enabled" - https://issues.apache.org/jira/browse/AMBARI-22918.

I'm on Ambari 2.5.1 and ran into this yesterday. Since I can't upgrade to a version of Ambari where this has been fixed, I patched manually.

There were no changes to the file between 2.5.1 and 2.5.2 so the below commands should work for you. Please verify the path to hbase_decommission.py in your environment and make a backup of hbase_decomission.py just in case. Make sure the backup is outside of /var/lib/ambari-server/resources/ or you'll have to clean it up from all of your nodes.

Roll out patch:

curl -o
/var/lib/ambari-server/resources/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py
'https://raw.githubusercontent.com/brfrn169/ambari/1a7237e5396114afd45cf55ee8c942d3037fbc3d/ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py'

Roll Back:

curl -o /var/lib/ambari-server/resources/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py
'https://raw.githubusercontent.com/apache/ambari/release-2.5.1/ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py'

Thanks,

Alan

New Contributor

Hello 

Have the same error and are hoping for a response to the question at hand.

 

Getting error when trying to decom a RS.

stderr:   /var/lib/ambari-agent/data/errors-3752.txt

Traceback (most recent call last):
  File "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", line 111, in <module>
    HbaseMaster().execute()
  File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 375, in execute
    method(env)
  File "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_master.py", line 55, in decommission
    hbase_decommission(env)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
    return fn(*args, **kwargs)
  File "/var/lib/ambari-agent/cache/common-services/HBASE/0.96.0.2.0/package/scripts/hbase_decommission.py", line 84, in hbase_decommission
    logoutput=True
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 166, in __init__
    self.env.run()
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
    self.run_action(resource, action)
  File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
    provider_action()
  File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
    tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of '/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/sktp2hhdp1mn03.xxxx.xx@XXXX.XX; /usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add sktp2hhdp1nn05.xxxx.xx' returned 1. Error: Could not find or load main class org.jruby.Main

stdout:   /var/lib/ambari-agent/data/output-3752.txt

2020-09-14 11:18:32,905 - Stack Feature Version Info: Cluster Stack=2.6, Command Stack=None, Command Version=2.6.4.0-91 -> 2.6.4.0-91
2020-09-14 11:18:32,914 - Using hadoop conf dir: /usr/hdp/2.6.4.0-91/hadoop/conf
2020-09-14 11:18:32,918 - checked_call['hostid'] {}
2020-09-14 11:18:32,933 - checked_call returned (0, 'd70a5240')
2020-09-14 11:18:32,941 - File['/usr/hdp/current/hbase-master/bin/draining_servers.rb'] {'content': StaticFile('draining_servers.rb'), 'mode': 0755}
2020-09-14 11:18:32,943 - Execute['/usr/bin/kinit -kt /etc/security/keytabs/hbase.service.keytab hbase/sktp2hhdp1mn03.xxxx.xx@XXXX.XX; /usr/hdp/current/hbase-master/bin/hbase --config /usr/hdp/current/hbase-master/conf -Djava.security.auth.login.config=/usr/hdp/current/hbase-master/conf/hbase_master_jaas.conf org.jruby.Main /usr/hdp/current/hbase-master/bin/draining_servers.rb add sktp2hhdp1nn05.xxxx.xx'] {'logoutput': True, 'user': 'hbase'}
Error: Could not find or load main class org.jruby.Main

Command failed after 1 tries