Created on 12-13-2017 10:59 PM - edited 05-07-2020 02:32 PM
The 0.14.0 release of Apache Knox includes the ability to dynamically determine topology endpoints for Hadoop services in Ambari-managed clusters. Prior to this release, users had to determine each of these endpoint URLs by navigating the Ambari UI (or combing through the various cluster configuration files), and explicitly add them to their topology descriptors; there was a lot of potential for human error.
Support for a new, simplified topology descriptor has been added to leverage this dynamic endpoint discovery and facilitate provider configuration sharing across topologies.
This is a dramatic improvement in the usability of Knox.
Simplified descriptors are a means to facilitate provider configuration sharing and service endpoint discovery. Rather than editing an XML topology descriptor, it’s now possible to create a simpler descriptor that declaratively specifies the desired contents of a topology, which will ultimately yield a full topology descriptor and corresponding deployment. These simplified descriptors allow service URLs to be specified explicitly, just as full topology descriptors do. However, if URLs are omitted for a service, Knox will attempt to discover that service’s URLs from the Hadoop cluster. Currently, this behavior is only supported for clusters managed by Ambari.
Property | Description |
discovery-address | The endpoint address for the discovery source |
discovery-type | The discovery source type. (Currently, the only supported type is AMBARI) |
discovery-user | The username with permission to access the discovery source. If omitted, then Knox will check for an alias named ambari.discovery.user, and use its value if defined. |
discovery-pwd-alias | The alias of the password for the user with permission to access the discovery source. If omitted, then Knox will check for an alias named ambari.discovery.password, and use its value if defined. |
provider-config-ref | A reference to a provider configuration in {GATEWAY_HOME}/conf/shared-providers/ |
cluster | The name of the cluster from which the topology service endpoints should be determined |
services | The collection of services to be included in the topology |
Two file formats are supported for two distinct purposes:
Format | Purpose |
YAML | intended for the individual hand-editing a simplified descriptor (because of its readability and support for comments) |
JSON | intended to be used for API interaction |
--- discovery-address : http://sandbox.hortonworks.com:8080 discovery-user : maria_dev discovery-pwd-alias : ambari.discovery.password provider-config-ref : sandbox-providers cluster: Sandbox services: - name: NAMENODE - name: JOBTRACKER - name: WEBHDFS - name: WEBHCAT - name: OOZIE - name: WEBHBASE - name: HIVE - name: RESOURCEMANAGER
A Note About Aliases
This example illustrates the specification of credentials for the interaction with Ambari. If no credentials are specified, then the default aliases are queried. Use of the default aliases is sufficient for scenarios where topology discovery will only interact with a single Ambari instance. For multiple Ambari instances however, each will most likely require a different set of credentials. The discovery-user and discovery-pwd-alias properties exist for this purpose. Whether using the default credential aliases or specifying a custom password alias, these aliases must be defined prior to any attempt to deploy a topology using a simplified descriptor.
Sometimes, the same provider configuration is applied to multiple Knox topologies. Unlike XML topology descriptors, simplified descriptors do not contain provider configuration; rather, they contain references to external provider configuration. With the provider configuration externalized from the simple descriptors, a single configuration can be applied to multiple topologies. This helps reduce the duplication of configuration, and the need to update multiple configuration files when a policy change is required. Updating a provider configuration triggers an update to all those topologies that reference it. The contents of externalized provider configuration is identical to the gateway element from a full topology descriptor. The only difference is that it’s defined in its own XML file in {GATEWAY_HOME}/conf/shared-providers/.
Effecting topology changes is as simple as modifying files in two specific directories. The {GATEWAY_HOME}/conf/shared-providers/ directory is the location where Knox looks for provider configurations. This directory is monitored for changes, such that modifying a provider configuration file therein will trigger updates to any referencing simplified descriptors in the {GATEWAY_HOME}/conf/descriptors/ directory. Care should be taken when deleting these files if there are referencing descriptors; any subsequent modifications of referencing descriptors will fail when the deleted provider configuration cannot be found. The references should all be modified before deleting the provider configuration.
Likewise, the {GATEWAY_HOME}/conf/descriptors/ directory is monitored for changes, such that adding or modifying a simplified descriptor file in this directory will trigger the generation and deployment of a topology. Deleting a descriptor from this directory will conversely result in the undeployment of the previously-generated topology.
Generated topology XML descriptors include an element to indicate the fact that they've been generated.
<generated>true</generated>
These generated topology XML files should not be modified directly. Any changes that are made could potentially be overwritten as a result of a change to the source descriptor, a change to the cluster configuration, or a gateway restart. While deleting a generated topology file will result in an undeployment of that topology, any of the aforementioned changes could result in the regeneration and deployment of that topology.
The Admin API and Admin UI disallow modifications to generated topologies. The Admin API does provide the ability to modify simple descriptors and provider configurations, and the Admin UI will provide a similar capability in the future. The only reliable means to modify generated topologies is through changes to their respective source descriptors and provider configurations, either directly on the gateway host or using the Admin API.
The Admin API has been augmented to support the management of provider configuration and simplified descriptor resources.
For more complete API details, see the Admin API section of the user guide
0. Install the HDP Sandbox (https://hortonworks.com/downloads/#sandbox)
1. Create the discovery aliases
{GATEWAY_HOME}/bin/knoxcli.sh create-alias ambari.discovery.user --value maria_dev {GATEWAY_HOME}/bin/knoxcli.sh create-alias ambari.discovery.password --value maria_dev
2. Start the demo LDAP server and the gateway
{GATEWAY_HOME}/bin/ldap.sh start {GATEWAY_HOME}/bin/gateway.sh start
3. Create/copy a provider config to the {GATEWAY_HOME}/conf/shared-providers/ directory
Sample sandbox-providers.xml
<gateway> <provider> <role>authentication</role> <name>ShiroProvider</name> <enabled>true</enabled> <param> <name>sessionTimeout</name> <value>30</value> </param> <param> <name>main.ldapRealm</name> <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value> </param> <param> <name>main.ldapContextFactory</name> <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value> </param> <param> <name>main.ldapRealm.contextFactory</name> <value>$ldapContextFactory</value> </param> <param> <name>main.ldapRealm.userDnTemplate</name> <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value> </param> <param> <name>main.ldapRealm.contextFactory.url</name> <value>ldap://localhost:33389</value> </param> <param> <name>main.ldapRealm.contextFactory.authenticationMechanism</name> <value>simple</value> </param> <param> <name>urls./**</name> <value>authcBasic</value> </param> </provider> </gateway>
4. Create/copy a simple descriptor to the descriptors directory (you can use the YAML sample presented earlier in this article)
cp simple-sandbox_y.yml {GATEWAY_HOME}/conf/descriptors/
5. Verify {GATEWAY_HOME}/logs/gateway.log and the contents of the {GATEWAY_HOME}/conf/topologies directory. There should be a file named simple-sandbox_y.xml in the topologies directory.
6. Test the deployed topology by invoking a request to a proxied Hadoop service
curl -iku guest:guest-password 'https://localhost:8443/gateway/simple-sandbox_y/webhdfs/v1/?op=LISTSTATUS'
7. Modify the provider config
touch {GATEWAY_HOME}/conf/shared-providers/sandbox-providers.xml
8. Test (check the timestamps of {GATEWAY_HOME}/conf/descriptors/simple-sandbox_y.yml and {GATEWAY_HOME}/conf/topologies/simple-sandbox_y.xml, and {GATEWAY_HOME}/logs/gateway.log to verify the regeneration and redeployment of the topology)
9. Modify the descriptor
touch {GATEWAY_HOME}/conf/descriptors/simple-sandbox_y.yml
10. Check the timestamp of {GATEWAY_HOME}/conf/topologies/simple-sandbox_y.xml, and {GATEWAY_HOME}/logs/gateway.log to verify the regeneration and redeployment of the topology.
11. Delete the simple descriptor from {GATEWAY_HOME}/conf/descriptors
(Verify the removal of {GATEWAY_HOME}/conf/topologies/simple-sandbox_y.xml, and check {GATEWAY_HOME}/logs/gateway.log to verify undeployment of the topology)
12. Repeat steps 3-11 using the Admin API instead of filesystem copies
a. Deploy the provider configuration to the gateway:
curl -iku admin:admin-password https://localhost:8443/gateway/admin/api/v1/providerconfig/sandbox-providers -X PUT -H Content-Type:application/xml -d "@sandbox-providers.xml"
b. The API requires descriptors to be in the JSON format:
simple-sandbox_j.json
{ "discovery-address":"http://localhost:8080", "provider-config-ref":"sandbox-providers", "cluster":"Sandbox", "services":[ {"name":"NAMENODE"}, {"name":"JOBTRACKER"}, {"name":"WEBHDFS"}, {"name":"WEBHCAT"}, {"name":"OOZIE"}, {"name":"WEBHBASE"}, {"name":"RESOURCEMANAGER"} ] }
Deploy the JSON descriptor to the gateway:
curl -iku admin:admin-password https://localhost:8443/gateway/admin/api/v1/descriptors/simple-sandbox -X PUT -H Content-Type:application/json -d "@simple-sandbox_j.json"
c. Test the resulting deployed topology
curl -iku guest:guest-password 'https://localhost:8443/gateway/simple-sandbox/webhdfs/v1/?op=LISTSTATUS'
d. Try to delete the provider configuration (It should be disallowed because simple-sandbox_j.json references it):
curl -iku admin:admin-password 'https://localhost:8443/gateway/admin/api/v1/providerconfig/sandbox-providers' -X DELETE
e. Delete the referencing descriptor:
curl -iku admin:admin-password 'https://localhost:8443/gateway/admin/api/v1/descriptors/simple-sandbox' -X DELETE
f. Then, try to delete the provider configuration again. It should succeed this time because there are no referencing descriptors.
Hopefully, the benefits of this new functionality are clear. Defining and deploying topologies for Ambari-managed Hadoop clusters is now easier and less error-prone. Provider configurations can now be shared by multiple topologies, reducing duplicate configuration and the associated potential for errors in managing changes. There are related UI enhancements coming soon, which will further ease the management of topologies, and continue the enhancement of Knox's usability.
Check out the User Guide for more details about these additions.