Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Rising Star

Introduction

The 0.14.0 release of Apache Knox includes the ability to dynamically determine topology endpoints for Hadoop services in Ambari-managed clusters. Prior to this release, users had to determine each of these endpoint URLs by navigating the Ambari UI (or combing through the various cluster configuration files), and explicitly add them to their topology descriptors; there was a lot of potential for human error.

Support for a new, simplified topology descriptor has been added to leverage this dynamic endpoint discovery and facilitate provider configuration sharing across topologies.

This is a dramatic improvement in the usability of Knox.

 

Simplified Descriptors

Simplified descriptors are a means to facilitate provider configuration sharing and service endpoint discovery. Rather than editing an XML topology descriptor, it’s now possible to create a simpler descriptor that declaratively specifies the desired contents of a topology, which will ultimately yield a full topology descriptor and corresponding deployment. These simplified descriptors allow service URLs to be specified explicitly, just as full topology descriptors do. However, if URLs are omitted for a service, Knox will attempt to discover that service’s URLs from the Hadoop cluster. Currently, this behavior is only supported for clusters managed by Ambari.

 

Descriptor Properties

PropertyDescription
discovery-addressThe endpoint address for the discovery source
discovery-typeThe discovery source type. (Currently, the only supported type is AMBARI)
discovery-userThe username with permission to access the discovery source.
If omitted, then Knox will check for an alias named ambari.discovery.user,
and use its value if defined.
discovery-pwd-aliasThe alias of the password for the user with permission to access the discovery source.
If omitted, then Knox will check for an alias named ambari.discovery.password,
and use its value if defined.
provider-config-refA reference to a provider configuration in {GATEWAY_HOME}/conf/shared-providers/
clusterThe name of the cluster from which the topology service endpoints should be determined
servicesThe collection of services to be included in the topology

 

File Formats

Two file formats are supported for two distinct purposes:

FormatPurpose
YAMLintended for the individual hand-editing a simplified descriptor (because of its readability and support for comments)
JSONintended to be used for API interaction

 

YAML Example (based on the HDP Docker Sandbox)

---
discovery-address : http://sandbox.hortonworks.com:8080
discovery-user : maria_dev
discovery-pwd-alias : ambari.discovery.password
provider-config-ref : sandbox-providers
cluster: Sandbox
services:
    - name: NAMENODE
    - name: JOBTRACKER
    - name: WEBHDFS
    - name: WEBHCAT
    - name: OOZIE
    - name: WEBHBASE
    - name: HIVE
    - name: RESOURCEMANAGER


A Note About Aliases

This example illustrates the specification of credentials for the interaction with Ambari. If no credentials are specified, then the default aliases are queried. Use of the default aliases is sufficient for scenarios where topology discovery will only interact with a single Ambari instance. For multiple Ambari instances however, each will most likely require a different set of credentials. The discovery-user and discovery-pwd-alias properties exist for this purpose. Whether using the default credential aliases or specifying a custom password alias, these aliases must be defined prior to any attempt to deploy a topology using a simplified descriptor.

 

Externalized Provider Configurations

Sometimes, the same provider configuration is applied to multiple Knox topologies. Unlike XML topology descriptors, simplified descriptors do not contain provider configuration; rather, they contain references to external provider configuration. With the provider configuration externalized from the simple descriptors, a single configuration can be applied to multiple topologies. This helps reduce the duplication of configuration, and the need to update multiple configuration files when a policy change is required. Updating a provider configuration triggers an update to all those topologies that reference it. The contents of externalized provider configuration is identical to the gateway element from a full topology descriptor. The only difference is that it’s defined in its own XML file in {GATEWAY_HOME}/conf/shared-providers/.

 

Monitored Directories

Effecting topology changes is as simple as modifying files in two specific directories. The {GATEWAY_HOME}/conf/shared-providers/ directory is the location where Knox looks for provider configurations. This directory is monitored for changes, such that modifying a provider configuration file therein will trigger updates to any referencing simplified descriptors in the {GATEWAY_HOME}/conf/descriptors/ directory. Care should be taken when deleting these files if there are referencing descriptors; any subsequent modifications of referencing descriptors will fail when the deleted provider configuration cannot be found. The references should all be modified before deleting the provider configuration.

Likewise, the {GATEWAY_HOME}/conf/descriptors/ directory is monitored for changes, such that adding or modifying a simplified descriptor file in this directory will trigger the generation and deployment of a topology. Deleting a descriptor from this directory will conversely result in the undeployment of the previously-generated topology.

 

Generated Topologies

Generated topology XML descriptors include an element to indicate the fact that they've been generated.

<generated>true</generated>

These generated topology XML files should not be modified directly. Any changes that are made could potentially be overwritten as a result of a change to the source descriptor, a change to the cluster configuration, or a gateway restart. While deleting a generated topology file will result in an undeployment of that topology, any of the aforementioned changes could result in the regeneration and deployment of that topology.

The Admin API and Admin UI disallow modifications to generated topologies. The Admin API does provide the ability to modify simple descriptors and provider configurations, and the Admin UI will provide a similar capability in the future. The only reliable means to modify generated topologies is through changes to their respective source descriptors and provider configurations, either directly on the gateway host or using the Admin API.

 

Admin API

The Admin API has been augmented to support the management of provider configuration and simplified descriptor resources.

  • Get a list of the current provider configurations deployed to the gateway: /gateway/admin/api/v1/providerconfig
  • Get/Put/Delete the provider configuration identified by {id}: /gateway/admin/api/v1/providerconfig/{id}
  • Get a list of the current descriptors deployed to the gateway: /gateway/admin/api/v1/descriptors
  • Get/Put/Delete the descriptor identified by {id}: /gateway/admin/api/v1/descriptors/{id}

For more complete API details, see the Admin API section of the user guide

 

Try It!

0. Install the HDP Sandbox (https://hortonworks.com/downloads/#sandbox)

1. Create the discovery aliases

{GATEWAY_HOME}/bin/knoxcli.sh create-alias ambari.discovery.user --value maria_dev
{GATEWAY_HOME}/bin/knoxcli.sh create-alias ambari.discovery.password --value maria_dev


2. Start the demo LDAP server and the gateway

{GATEWAY_HOME}/bin/ldap.sh start
{GATEWAY_HOME}/bin/gateway.sh start


3. Create/copy a provider config to the {GATEWAY_HOME}/conf/shared-providers/ directory

Sample sandbox-providers.xml

<gateway>
 <provider>
  <role>authentication</role>
  <name>ShiroProvider</name>
  <enabled>true</enabled>
  <param>
   <name>sessionTimeout</name>
   <value>30</value>
  </param>
  <param>
   <name>main.ldapRealm</name>
   <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value>
  </param>
  <param>
   <name>main.ldapContextFactory</name>
   <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value>
  </param>
  <param>
   <name>main.ldapRealm.contextFactory</name>
   <value>$ldapContextFactory</value>
  </param>
  <param>
   <name>main.ldapRealm.userDnTemplate</name>
   <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value>
  </param>
  <param>
   <name>main.ldapRealm.contextFactory.url</name>
   <value>ldap://localhost:33389</value>
  </param>
  <param>
   <name>main.ldapRealm.contextFactory.authenticationMechanism</name>
   <value>simple</value>
  </param>
  <param>
   <name>urls./**</name>
   <value>authcBasic</value>
  </param>
 </provider>
</gateway>


4. Create/copy a simple descriptor to the descriptors directory (you can use the YAML sample presented earlier in this article)

cp simple-sandbox_y.yml {GATEWAY_HOME}/conf/descriptors/


5. Verify {GATEWAY_HOME}/logs/gateway.log and the contents of the {GATEWAY_HOME}/conf/topologies directory. There should be a file named simple-sandbox_y.xml in the topologies directory.

6. Test the deployed topology by invoking a request to a proxied Hadoop service

curl -iku guest:guest-password 'https://localhost:8443/gateway/simple-sandbox_y/webhdfs/v1/?op=LISTSTATUS'


7. Modify the provider config

touch {GATEWAY_HOME}/conf/shared-providers/sandbox-providers.xml


8. Test (check the timestamps of {GATEWAY_HOME}/conf/descriptors/simple-sandbox_y.yml and {GATEWAY_HOME}/conf/topologies/simple-sandbox_y.xml, and {GATEWAY_HOME}/logs/gateway.log to verify the regeneration and redeployment of the topology)

9. Modify the descriptor

touch {GATEWAY_HOME}/conf/descriptors/simple-sandbox_y.yml


10. Check the timestamp of {GATEWAY_HOME}/conf/topologies/simple-sandbox_y.xml, and {GATEWAY_HOME}/logs/gateway.log to verify the regeneration and redeployment of the topology.

11. Delete the simple descriptor from {GATEWAY_HOME}/conf/descriptors
(Verify the removal of {GATEWAY_HOME}/conf/topologies/simple-sandbox_y.xml, and check {GATEWAY_HOME}/logs/gateway.log to verify undeployment of the topology)

12. Repeat steps 3-11 using the Admin API instead of filesystem copies

a. Deploy the provider configuration to the gateway:

curl -iku admin:admin-password https://localhost:8443/gateway/admin/api/v1/providerconfig/sandbox-providers -X PUT -H Content-Type:application/xml -d "@sandbox-providers.xml"


b. The API requires descriptors to be in the JSON format:

simple-sandbox_j.json

{
  "discovery-address":"http://localhost:8080",
  "provider-config-ref":"sandbox-providers",
  "cluster":"Sandbox",
  "services":[
      {"name":"NAMENODE"},
      {"name":"JOBTRACKER"},
      {"name":"WEBHDFS"},
      {"name":"WEBHCAT"},
      {"name":"OOZIE"},
      {"name":"WEBHBASE"},
      {"name":"RESOURCEMANAGER"}
  ]
}

Deploy the JSON descriptor to the gateway:

curl -iku admin:admin-password https://localhost:8443/gateway/admin/api/v1/descriptors/simple-sandbox -X PUT -H Content-Type:application/json -d "@simple-sandbox_j.json"


c. Test the resulting deployed topology

curl -iku guest:guest-password 'https://localhost:8443/gateway/simple-sandbox/webhdfs/v1/?op=LISTSTATUS'


d. Try to delete the provider configuration (It should be disallowed because simple-sandbox_j.json references it):

curl -iku admin:admin-password 'https://localhost:8443/gateway/admin/api/v1/providerconfig/sandbox-providers' -X DELETE


e. Delete the referencing descriptor:

curl -iku admin:admin-password 'https://localhost:8443/gateway/admin/api/v1/descriptors/simple-sandbox' -X DELETE


f. Then, try to delete the provider configuration again. It should succeed this time because there are no referencing descriptors.

Summary

Hopefully, the benefits of this new functionality are clear. Defining and deploying topologies for Ambari-managed Hadoop clusters is now easier and less error-prone. Provider configurations can now be shared by multiple topologies, reducing duplicate configuration and the associated potential for errors in managing changes. There are related UI enhancements coming soon, which will further ease the management of topologies, and continue the enhancement of Knox's usability.

Check out the User Guide for more details about these additions.

2,621 Views