Created 09-15-2017 12:37 PM
Using Cloudbreak I install a cluster and check that it works. I then reinstall the same cluster and enable security & Knox the cluster no longer installs correctly. Any help would be appreciated, I'm sure I have forgotten a step. I tried doing this through the UI and from a script. (Blueprint attached if that helps... but as my comment says below I was also able to replicate this with one of the default blueprints: "Data Science: Apache Spark 2.1, Apache Zeppelin 0.7.0")
Here's how I built the cluster:
credential select --name cloudbreakcredential blueprint select --name "HA, zepplin and Ooziev2.7" instancegroup configure --AZURE --instanceGroup master1 --nodecount 1 --templateName default-infrastructure-template-d4 --securityGroupName internal-ports-and-ssh --ambariServer false instancegroup configure --AZURE --instanceGroup master2 --nodecount 1 --templateName default-infrastructure-template-d4 --securityGroupName internal-ports-and-ssh --ambariServer false instancegroup configure --AZURE --instanceGroup master3 --nodecount 1 --templateName default-infrastructure-template-d4 --securityGroupName internal-ports-and-ssh --ambariServer false instancegroup configure --AZURE --instanceGroup master4 --nodecount 1 --templateName default-infrastructure-template-d4 --securityGroupName internal-ports-and-ssh --ambariServer false instancegroup configure --AZURE --instanceGroup Utility1 --nodecount 1 --templateName default-infrastructure-template --securityGroupName internal-ports-and-ssh --ambariServer true instancegroup configure --AZURE --instanceGroup worker --nodecount 5 --templateName default-infrastructure-template --securityGroupName internal-ports-and-ssh --ambariServer false #hostgroup configure --recipeNames ranger-pre-installation --hostgroup master4 --timeout 15 network select --name default-azure-network stack create --AZURE --name hadoop-pilot-oozie-rg --region "Canada East" --wait true --attachedStorageType PER_VM cluster create --description "Haoop Pilot" --password [password] --wait true --enableKnoxGateway --enableSecurity true --kerberosAdmin admin --kerberosMasterKey [masterkey] --kerberosPassword [password]
Created 09-20-2017 12:31 PM
Hi @Matt Andruff,
Cloudbreak does not populate kerberos related settings into the gateway-site.xml config file of Knox: https://github.com/apache/knox/blob/master/gateway-release/home/conf/gateway-site.xml#L40 E.g gateway.hadoop.kerberos.secured is not set and krb5.conf is not set.
Until this is fixed in Cloudbreak I recommend not to select the "Enable Knox Gateway" on Cloudbreak, but as a workaround you can add KNOX_SERVER into the blueprint and let Ambari configure Knox.
Kind regards,
Attila
Created 09-15-2017 12:44 PM
Hi,
What do you mean by "when I enable security the cluster no longer works"? Would you mind attaching the cloudbreak log's relevant part where there are some exceptions. Also I can see that you are using a custom security group, is 9443 enabled in the group?
Created 09-15-2017 12:55 PM
I mean that if I don't install with security the cluster starts up without issues. Yes, my security group does have 9443 enabled.
HiveServer2 fails to install:
stderr: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server.py", line 227, in <module> HiveServer().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 314, in execute method(env) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server.py", line 81, in start self.configure(env) # FOR SECURITY File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 117, in locking_configure original_configure(obj, *args, **kw) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive_server.py", line 52, in configure hive(name='hiveserver2') File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", line 89, in thunk return fn(*args, **kwargs) File "/var/lib/ambari-agent/cache/common-services/HIVE/0.12.0.2.0/package/scripts/hive.py", line 141, in hive copy_to_hdfs("mapreduce", params.user_group, params.hdfs_user, skip=params.sysprep_skip_copy_tarballs_hdfs) File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/copy_tarball.py", line 267, in copy_to_hdfs replace_existing_files=replace_existing_files, File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 555, in action_create_on_execute self.action_delayed("create") File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 552, in action_delayed self.get_hdfs_resource_executor().action_delayed(action_name, self) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 287, in action_delayed self._create_resource() File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 303, in _create_resource self._create_file(self.main_resource.resource.target, source=self.main_resource.resource.source, mode=self.mode) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 418, in _create_file self.util.run_command(target, 'CREATE', method='PUT', overwrite=True, assertable_result=False, file_to_put=source, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", line 199, in run_command raise Fail(err_msg) resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w '%{http_code}' -X PUT --data-binary @/usr/hdp/2.5.5.0-157/hadoop/mapreduce.tar.gz -H 'Content-Type: application/octet-stream' --negotiate -u : 'http://had-m1.bt52pnivtndublvux4s5oursrh.ux.internal.cloudapp.net:50070/webhdfs/v1/hdp/apps/2.5.5.0-157/mapreduce/mapreduce.tar.gz?op=CREATE&user.name=hdfs&overwrite=True&permission=444'' returned status_code=403. { "RemoteException": { "exception": "IOException", "javaClassName": "java.io.IOException", "message": "Failed to find datanode, suggest to check cluster health." } [this is repeated multiple times as it retries] ...
Created 09-15-2017 07:10 PM
I have reproduced this with
Data Science: Apache Spark 2.1, Apache Zeppelin 0.7.0 blueprintCreated 09-20-2017 12:31 PM
Hi @Matt Andruff,
Cloudbreak does not populate kerberos related settings into the gateway-site.xml config file of Knox: https://github.com/apache/knox/blob/master/gateway-release/home/conf/gateway-site.xml#L40 E.g gateway.hadoop.kerberos.secured is not set and krb5.conf is not set.
Until this is fixed in Cloudbreak I recommend not to select the "Enable Knox Gateway" on Cloudbreak, but as a workaround you can add KNOX_SERVER into the blueprint and let Ambari configure Knox.
Kind regards,
Attila