Created 12-23-2015 12:23 AM
I re-start a one node cluster with Ambari. All service start sucessfully except Knox Gateway.
Oracle Linux 7
Apache Ambari Version 2.1.1
----
stderr:
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox_gateway.py", line 267, in <module>
KnoxGateway().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 218, in execute
method(env)
File "/var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox_gateway.py", line 146, in start
self.configure(env)
File "/var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox_gateway.py", line 63, in configure
knox()
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk
return fn(*args, **kwargs)
File "/var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox.py", line 125, in knox
not_if=master_secret_exist,
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 154, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 152, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 118, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py", line 258, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of '/usr/hdp/current/knox-server/bin/knoxcli.sh create-master --master [PROTECTED]' returned 1. Master secret is already present on disk. Please be aware that overwriting it will require updating other security artifacts. Use --force to overwrite the existing master secret.
ERROR: Invalid Command
Unrecognized option:create-master
A fatal exception has occurred. Program will exit.
stdout:
2015-12-22 18:46:33,793 - Directory['/var/lib/ambari-agent/data/tmp/AMBARI-artifacts/'] {'recursive': True}
2015-12-22 18:46:33,800 - File['/var/lib/ambari-agent/data/tmp/AMBARI-artifacts//jce_policy-8.zip'] {'content': DownloadSource('http://BODTEST02.bod.com.ve:8080/resources//jce_policy-8.zip')}
2015-12-22 18:46:33,801 - Not downloading the file from <a href="http://BODTEST02.bod.com.ve:8080/resources//jce_policy-8.zip">http://BODTEST02.bod.com.ve:8080/resources//jce_policy-8.zip</a>, because /var/lib/ambari-agent/data/tmp/jce_policy-8.zip already exists
2015-12-22 18:46:33,801 - Group['spark'] {'ignore_failures': False}
2015-12-22 18:46:33,801 - Group['hadoop'] {'ignore_failures': False}
2015-12-22 18:46:33,802 - Group['users'] {'ignore_failures': False}
2015-12-22 18:46:33,802 - Group['knox'] {'ignore_failures': False}
2015-12-22 18:46:33,802 - User['hive'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,803 - User['storm'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,803 - User['zookeeper'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,804 - User['oozie'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2015-12-22 18:46:33,810 - User['atlas'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,810 - User['ams'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,811 - User['falcon'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2015-12-22 18:46:33,811 - User['tez'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2015-12-22 18:46:33,812 - User['accumulo'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,812 - User['mahout'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,813 - User['spark'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,813 - User['ambari-qa'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'users']}
2015-12-22 18:46:33,814 - User['flume'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,820 - User['kafka'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,821 - User['hdfs'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,821 - User['sqoop'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,822 - User['yarn'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,822 - User['mapred'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,823 - User['hbase'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,823 - User['knox'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,824 - User['hcat'] {'gid': 'hadoop', 'ignore_failures': False, 'groups': [u'hadoop']}
2015-12-22 18:46:33,830 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-12-22 18:46:33,831 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2015-12-22 18:46:33,901 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh ambari-qa /tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa'] due to not_if
2015-12-22 18:46:33,901 - Directory['/tmp/hbase-hbase'] {'owner': 'hbase', 'recursive': True, 'mode': 0775, 'cd_access': 'a'}
2015-12-22 18:46:33,902 - File['/var/lib/ambari-agent/data/tmp/changeUid.sh'] {'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2015-12-22 18:46:33,903 - Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] {'not_if': '(test $(id -u hbase) -gt 1000) || (false)'}
2015-12-22 18:46:33,913 - Skipping Execute['/var/lib/ambari-agent/data/tmp/changeUid.sh hbase /home/hbase,/tmp/hbase,/usr/bin/hbase,/var/log/hbase,/tmp/hbase-hbase'] due to not_if
2015-12-22 18:46:33,913 - Group['hdfs'] {'ignore_failures': False}
2015-12-22 18:46:33,913 - User['hdfs'] {'ignore_failures': False, 'groups': [u'hadoop', u'hdfs']}
2015-12-22 18:46:33,914 - Directory['/etc/hadoop'] {'mode': 0755}
2015-12-22 18:46:33,939 - File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2015-12-22 18:46:33,969 - Execute[('setenforce', '0')] {'not_if': '(! which getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': True, 'only_if': 'test -f /selinux/enforce'}
2015-12-22 18:46:34,011 - Skipping Execute[('setenforce', '0')] due to not_if
2015-12-22 18:46:34,011 - Directory['/var/log/hadoop'] {'owner': 'root', 'mode': 0775, 'group': 'hadoop', 'recursive': True, 'cd_access': 'a'}
2015-12-22 18:46:34,013 - Directory['/var/run/hadoop'] {'owner': 'root', 'group': 'root', 'recursive': True, 'cd_access': 'a'}
2015-12-22 18:46:34,013 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 'recursive': True, 'cd_access': 'a'}
2015-12-22 18:46:34,017 - File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] {'content': Template('commons-logging.properties.j2'), 'owner': 'hdfs'}
2015-12-22 18:46:34,026 - File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': Template('health_check.j2'), 'owner': 'hdfs'}
2015-12-22 18:46:34,026 - File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2015-12-22 18:46:34,045 - File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] {'content': Template('hadoop-metrics2.properties.j2'), 'owner': 'hdfs'}
2015-12-22 18:46:34,045 - File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': StaticFile('task-log4j.properties'), 'mode': 0755}
2015-12-22 18:46:34,046 - File['/usr/hdp/current/hadoop-client/conf/configuration.xsl'] {'owner': 'hdfs', 'group': 'hadoop'}
2015-12-22 18:46:34,056 - File['/etc/hadoop/conf/topology_mappings.data'] {'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 'test -d /etc/hadoop/conf', 'group': 'hadoop'}
2015-12-22 18:46:34,075 - File['/etc/hadoop/conf/topology_script.py'] {'content': StaticFile('topology_script.py'), 'only_if': 'test -d /etc/hadoop/conf', 'mode': 0755}
2015-12-22 18:46:34,459 - Directory['/var/lib/knox/data'] {'owner': 'knox', 'group': 'knox', 'recursive': True}
2015-12-22 18:46:34,470 - Directory['/var/log/knox'] {'owner': 'knox', 'group': 'knox', 'recursive': True}
2015-12-22 18:46:34,470 - Directory['/var/run/knox'] {'owner': 'knox', 'group': 'knox', 'recursive': True}
2015-12-22 18:46:34,480 - Directory['/usr/hdp/current/knox-server/conf'] {'owner': 'knox', 'group': 'knox', 'recursive': True}
2015-12-22 18:46:34,488 - Directory['/usr/hdp/current/knox-server/conf/topologies'] {'owner': 'knox', 'group': 'knox', 'recursive': True}
2015-12-22 18:46:34,488 - XmlConfig['gateway-site.xml'] {'owner': 'knox', 'group': 'knox', 'conf_dir': '/usr/hdp/current/knox-server/conf', 'configuration_attributes': {}, 'configurations': ...}
2015-12-22 18:46:34,538 - Generating config: /usr/hdp/current/knox-server/conf/gateway-site.xml
2015-12-22 18:46:34,538 - File['/usr/hdp/current/knox-server/conf/gateway-site.xml'] {'owner': 'knox', 'content': InlineTemplate(...), 'group': 'knox', 'mode': None, 'encoding': 'UTF-8'}
2015-12-22 18:46:34,556 - Writing File['/usr/hdp/current/knox-server/conf/gateway-site.xml'] because contents don't match
2015-12-22 18:46:34,557 - File['/usr/hdp/current/knox-server/conf/gateway-log4j.properties'] {'content': ..., 'owner': 'knox', 'group': 'knox', 'mode': 0644}
2015-12-22 18:46:34,568 - File['/usr/hdp/current/knox-server/conf/topologies/default.xml'] {'content': InlineTemplate(...), 'owner': 'knox', 'group': 'knox'}
2015-12-22 18:46:34,568 - Execute[('chown', '-R', u'knox:knox', '/var/lib/knox/data', '/var/log/knox', u'/var/run/knox', '/usr/hdp/current/knox-server/conf', '/usr/hdp/current/knox-server/conf/topologies')] {'sudo': True}
2015-12-22 18:46:34,582 - Execute['/usr/hdp/current/knox-server/bin/knoxcli.sh create-master --master [PROTECTED]'] {'environment': {'JAVA_HOME': u'/usr/jdk64/jdk1.8.0_40'}, 'not_if': "ambari-sudo.sh su knox -l -s /bin/bash -c 'test -f /var/lib/knox/data/security/master'", 'user': 'knox'}
Created 01-15-2016 03:48 PM
Looking at the command that causes the error:
2016-01-15 15:37:00,440 - Execute['/usr/hdp/current/knox-server/bin/knoxcli.sh create-master --master [PROTECTED]'] {'environment': {'JAVA_HOME': u'/usr/jdk64/jdk1.7.0_67'}, 'not_if': "ambari-sudo.sh su knox -l -s /bin/bash -c 'test -f /var/lib/knox/data/security/master'", 'user': 'knox'}Originates to /var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox.py
cmd = format('{knox_client_bin} create-master --master {knox_master_secret!p}')
master_secret_exist = as_user(format('test -f {knox_master_secret_path}'), params.knox_user)
Execute(cmd,user=params.knox_user,environment={'JAVA_HOME': params.java_home},not_if=master_secret_exist,){knox_master_secret_path} resolves to /var/lib/knox/data/security/master (as defined in /var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/params_linux.py). The problem is that the Knox master file does not exist on this location. The directory /var/lib/knox/data does exist, but the content is empty.
Instead, the master key is located here: /usr/hdp/current/knox-server/data/security/master
In the file /var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/knox_gateway.py, I also see something with removing/setting symbolic links:
# Used to setup symlink, needed to update the knox managed symlink, in case of custom locations if os.path.islink(params.knox_managed_pid_symlink) and os.path.realpath(params.knox_managed_pid_symlink) != params.knox_pid_dir: os.unlink(params.knox_managed_pid_symlink) os.symlink(params.knox_pid_dir, params.knox_managed_pid_symlink)
Perhaps something goes wrong with the symbolic links? (when you install HDP2.3 successfully, but try to restart all services immediately after the installation?)
----
In any case, the following modification resolved the issue for me.. I'm not sure if it covers everything (e.g. what will happen if you change the Knox master key via the Ambari web interface??), but I don't have any more time to be stuck on this issue 🙂
Open /var/lib/ambari-agent/cache/common-services/KNOX/0.5.0.2.2/package/scripts/params_linux.py
Change:
knox_master_secret_path = '/var/lib/knox/data/security/master' knox_cert_store_path = '/var/lib/knox/data/security/keystores/gateway.jks'
to:
knox_master_secret_path = '/usr/hdp/current/knox-server/data/security/master' knox_cert_store_path = '/usr/hdp/current/knox-server/data/security/keystores/gateway.jks'
Looking forward to a response from a Hortonworks employee..
Created 11-07-2018 09:18 AM
Updating params_linux.py on both server and agent fixed my issue