About pzampino

pzampino · ‎01-30-2019

@abbas mohammadnejad You have to explicitly set the gateway.dispatch.whitelist property in gateway-site.xml, such that the pattern will match the endpoint address.

pzampino · ‎09-28-2018

Is your sandbox topology pointing to your cluster correctly? What does your gateway-audit.log show?

pzampino · ‎09-28-2018

Have you considered using the Ambari REST API, or are you concerned with clusters which are not managed by Ambari? You could use curl to invoke it from your scripts.

pzampino · ‎08-24-2018

Thank you for following up. That's what I suspected, and it's good to document it here for future reference.

pzampino · ‎08-24-2018

@Lian Jiang Can you explain why the default whitelist was not working for your deployment?

pzampino · ‎08-22-2018

None of Ambari, Zeppelin, or RangerUI are affected by this whitelisting. Can you see the default whitelist in gateway.log? It should say something like Applying a derived dispatch whitelist because none is configured in gateway-site: xxxxxxx

pzampino · ‎12-19-2017

Introduction My Apache Knox Dynamic Service Endpoint Discovery article describes some exciting new functionality available in the 0.14.0 release of Apache Knox. The gateway is now able to dynamically determine the endpoint URLs of cluster services to proxy from Ambari. The associated benefits are described in that article. Another benefit of this new functionality, which is not mentioned in that article, is the added ability to dynamically respond to cluster configuration changes that affect generated Knox topologies by re-generating and re-deploying those topologies. Without this, deployed topologies can be easily disabled when any of the proxied Hadoop services' configuration changes in the cluster. Cluster Monitoring When Knox deploys a simple topology descriptor, and generates a corresponding topology based on discovered cluster configuration details, it subsequently has the ability to monitor that cluster configuration for changes. When it discovers a change, it updates all of its topologies that are based on that modified cluster, and redeploys them. This has the potential to greatly reduce downtime for Knox due to cluster configuration changes. For example, suppose a descriptor (docker-sandbox.json) is deployed, intended to proxy services in the HDP Docker Sandbox. Following the successful generation and deployment of the docker-sandbox topology, Knox can monitor the Sandbox cluster managed by Ambari. If an administrator were to update the dfs.namenode.http-address property value in the hdfs-site configuration, changing the port number for example, the Knox proxy for the WEBHDFS service would no longer work. However, if the Ambari cluster monitor is enabled, Knox would regenerate and redeploy the docker-sandbox topology, such that it would contain the correct port for the WEBHDFS service URL, and Knox clients would continue to work. By default this monitor is disabled, but it can easily be enabled by setting the gateway.cluster.config.monitor.ambari.enabled property value to true in the gateway-site configuration. <property> <name>gateway.cluster.config.monitor.ambari.enabled</name> <value>true</value> <description>Enable/disable Ambari cluster configuration monitoring.</description> </property> Also in the gateway-site configuration, there is a property for controlling the frequency with which Knox will check the clusters for which it has deployed topologies. For demonstration purposes, you may want to set this as low as 20 or 30 seconds. <property> <name>gateway.cluster.config.monitor.ambari.interval</name> <value>60</value> <description>The interval (in seconds) for polling Ambari for cluster changes.</description> </property> Try It The Apache Knox Dynamic Service Endpoint Discovery article includes instructions for deploying topologies using simple descriptors, employing service URL discovery. Starting from there, you can enable the Ambari cluster monitoring, and make a cluster configuration change like the one described in this article. Then, you'll see how Knox responds to the change, and adapts to continue providing the proxied WEBHDFS service to its clients. 1. Set the gateway.cluster.config.monitor.ambari.enabled property value to true in {GATEWAY_HOME}/conf/gateway-site.xml 2. Restart the gateway 3. Use Ambari to modify the hdfs-site dfs.namenode.http-address configuration property value as described in the example. 4. Allow the gateway to notice the configuration change (watch the {GATEWAY_HOME}/logs/gateway.log for the messages) 5. Review {GATEWAY_HOME}/conf/topologies/docker-sandbox.xml, and notice the change to the WEBHDFS service URL. Your sandbox must expose the new port you specified for the dfs.namenode.http-address property for Knox to be able to access the new endpoint; otherwise, even though the topology will be correct, requests will fail due to connection failure. Summary While it doesn't take long to describe, this feature is a significant addition to the value provided by Knox. The ability to dynamically adapt to cluster service configuration changes reduces the effort required (and the potential for errors) by administrators when making such changes. N.B., Statically-defined topologies (i.e., those deployed directly by a regular topology XML file) do NOT benefit from this monitoring support. More details are available in the User Guide.

pzampino · ‎12-15-2017

Change 'sandbox' to whatever the image name is in your local image repository. Alternatively, you can specify the container ID.

pzampino · ‎12-13-2017

Introduction The 0.14.0 release of Apache Knox includes the ability to dynamically determine topology endpoints for Hadoop services in Ambari-managed clusters. Prior to this release, users had to determine each of these endpoint URLs by navigating the Ambari UI (or combing through the various cluster configuration files), and explicitly add them to their topology descriptors; there was a lot of potential for human error. Support for a new, simplified topology descriptor has been added to leverage this dynamic endpoint discovery and facilitate provider configuration sharing across topologies. This is a dramatic improvement in the usability of Knox. Simplified Descriptors Simplified descriptors are a means to facilitate provider configuration sharing and service endpoint discovery. Rather than editing an XML topology descriptor, it’s now possible to create a simpler descriptor that declaratively specifies the desired contents of a topology, which will ultimately yield a full topology descriptor and corresponding deployment. These simplified descriptors allow service URLs to be specified explicitly, just as full topology descriptors do. However, if URLs are omitted for a service, Knox will attempt to discover that service’s URLs from the Hadoop cluster. Currently, this behavior is only supported for clusters managed by Ambari. Descriptor Properties Property Description discovery-address The endpoint address for the discovery source discovery-type The discovery source type. (Currently, the only supported type is AMBARI) discovery-user The username with permission to access the discovery source. If omitted, then Knox will check for an alias named ambari.discovery.user, and use its value if defined. discovery-pwd-alias The alias of the password for the user with permission to access the discovery source. If omitted, then Knox will check for an alias named ambari.discovery.password, and use its value if defined. provider-config-ref A reference to a provider configuration in {GATEWAY_HOME}/conf/shared-providers/ cluster The name of the cluster from which the topology service endpoints should be determined services The collection of services to be included in the topology File Formats Two file formats are supported for two distinct purposes: Format Purpose YAML intended for the individual hand-editing a simplified descriptor (because of its readability and support for comments) JSON intended to be used for API interaction YAML Example (based on the HDP Docker Sandbox) --- discovery-address : http://sandbox.hortonworks.com:8080 discovery-user : maria_dev discovery-pwd-alias : ambari.discovery.password provider-config-ref : sandbox-providers cluster: Sandbox services: - name: NAMENODE - name: JOBTRACKER - name: WEBHDFS - name: WEBHCAT - name: OOZIE - name: WEBHBASE - name: HIVE - name: RESOURCEMANAGER A Note About Aliases This example illustrates the specification of credentials for the interaction with Ambari. If no credentials are specified, then the default aliases are queried. Use of the default aliases is sufficient for scenarios where topology discovery will only interact with a single Ambari instance. For multiple Ambari instances however, each will most likely require a different set of credentials. The discovery-user and discovery-pwd-alias properties exist for this purpose. Whether using the default credential aliases or specifying a custom password alias, these aliases must be defined prior to any attempt to deploy a topology using a simplified descriptor. Externalized Provider Configurations Sometimes, the same provider configuration is applied to multiple Knox topologies. Unlike XML topology descriptors, simplified descriptors do not contain provider configuration; rather, they contain references to external provider configuration. With the provider configuration externalized from the simple descriptors, a single configuration can be applied to multiple topologies. This helps reduce the duplication of configuration, and the need to update multiple configuration files when a policy change is required. Updating a provider configuration triggers an update to all those topologies that reference it. The contents of externalized provider configuration is identical to the gateway element from a full topology descriptor. The only difference is that it’s defined in its own XML file in {GATEWAY_HOME}/conf/shared-providers/. Monitored Directories Effecting topology changes is as simple as modifying files in two specific directories. The {GATEWAY_HOME}/conf/shared-providers/ directory is the location where Knox looks for provider configurations. This directory is monitored for changes, such that modifying a provider configuration file therein will trigger updates to any referencing simplified descriptors in the {GATEWAY_HOME}/conf/descriptors/ directory. Care should be taken when deleting these files if there are referencing descriptors; any subsequent modifications of referencing descriptors will fail when the deleted provider configuration cannot be found. The references should all be modified before deleting the provider configuration. Likewise, the {GATEWAY_HOME}/conf/descriptors/ directory is monitored for changes, such that adding or modifying a simplified descriptor file in this directory will trigger the generation and deployment of a topology. Deleting a descriptor from this directory will conversely result in the undeployment of the previously-generated topology. Generated Topologies Generated topology XML descriptors include an element to indicate the fact that they've been generated. <generated>true</generated> These generated topology XML files should not be modified directly. Any changes that are made could potentially be overwritten as a result of a change to the source descriptor, a change to the cluster configuration, or a gateway restart. While deleting a generated topology file will result in an undeployment of that topology, any of the aforementioned changes could result in the regeneration and deployment of that topology. The Admin API and Admin UI disallow modifications to generated topologies. The Admin API does provide the ability to modify simple descriptors and provider configurations, and the Admin UI will provide a similar capability in the future. The only reliable means to modify generated topologies is through changes to their respective source descriptors and provider configurations, either directly on the gateway host or using the Admin API. Admin API The Admin API has been augmented to support the management of provider configuration and simplified descriptor resources. Get a list of the current provider configurations deployed to the gateway: /gateway/admin/api/v1/providerconfig Get/Put/Delete the provider configuration identified by {id}: /gateway/admin/api/v1/providerconfig/{id} Get a list of the current descriptors deployed to the gateway: /gateway/admin/api/v1/descriptors Get/Put/Delete the descriptor identified by {id}: /gateway/admin/api/v1/descriptors/{id} For more complete API details, see the Admin API section of the user guide Try It! 0. Install the HDP Sandbox (https://hortonworks.com/downloads/#sandbox) 1. Create the discovery aliases {GATEWAY_HOME}/bin/knoxcli.sh create-alias ambari.discovery.user --value maria_dev {GATEWAY_HOME}/bin/knoxcli.sh create-alias ambari.discovery.password --value maria_dev 2. Start the demo LDAP server and the gateway {GATEWAY_HOME}/bin/ldap.sh start {GATEWAY_HOME}/bin/gateway.sh start 3. Create/copy a provider config to the {GATEWAY_HOME}/conf/shared-providers/ directory Sample sandbox-providers.xml <gateway> <provider> <role>authentication</role> <name>ShiroProvider</name> <enabled>true</enabled> <param> <name>sessionTimeout</name> <value>30</value> </param> <param> <name>main.ldapRealm</name> <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapRealm</value> </param> <param> <name>main.ldapContextFactory</name> <value>org.apache.hadoop.gateway.shirorealm.KnoxLdapContextFactory</value> </param> <param> <name>main.ldapRealm.contextFactory</name> <value>$ldapContextFactory</value> </param> <param> <name>main.ldapRealm.userDnTemplate</name> <value>uid={0},ou=people,dc=hadoop,dc=apache,dc=org</value> </param> <param> <name>main.ldapRealm.contextFactory.url</name> <value>ldap://localhost:33389</value> </param> <param> <name>main.ldapRealm.contextFactory.authenticationMechanism</name> <value>simple</value> </param> <param> <name>urls./**</name> <value>authcBasic</value> </param> </provider> </gateway> 4. Create/copy a simple descriptor to the descriptors directory (you can use the YAML sample presented earlier in this article) cp simple-sandbox_y.yml {GATEWAY_HOME}/conf/descriptors/ 5. Verify {GATEWAY_HOME}/logs/gateway.log and the contents of the {GATEWAY_HOME}/conf/topologies directory. There should be a file named simple-sandbox_y.xml in the topologies directory. 6. Test the deployed topology by invoking a request to a proxied Hadoop service curl -iku guest:guest-password 'https://localhost:8443/gateway/simple-sandbox_y/webhdfs/v1/?op=LISTSTATUS' 7. Modify the provider config touch {GATEWAY_HOME}/conf/shared-providers/sandbox-providers.xml 8. Test (check the timestamps of {GATEWAY_HOME}/conf/descriptors/simple-sandbox_y.yml and {GATEWAY_HOME}/conf/topologies/simple-sandbox_y.xml, and {GATEWAY_HOME}/logs/gateway.log to verify the regeneration and redeployment of the topology) 9. Modify the descriptor touch {GATEWAY_HOME}/conf/descriptors/simple-sandbox_y.yml 10. Check the timestamp of {GATEWAY_HOME}/conf/topologies/simple-sandbox_y.xml, and {GATEWAY_HOME}/logs/gateway.log to verify the regeneration and redeployment of the topology. 11. Delete the simple descriptor from {GATEWAY_HOME}/conf/descriptors (Verify the removal of {GATEWAY_HOME}/conf/topologies/simple-sandbox_y.xml, and check {GATEWAY_HOME}/logs/gateway.log to verify undeployment of the topology) 12. Repeat steps 3-11 using the Admin API instead of filesystem copies a. Deploy the provider configuration to the gateway: curl -iku admin:admin-password https://localhost:8443/gateway/admin/api/v1/providerconfig/sandbox-providers -X PUT -H Content-Type:application/xml -d "@sandbox-providers.xml" b. The API requires descriptors to be in the JSON format: simple-sandbox_j.json { "discovery-address":"http://localhost:8080", "provider-config-ref":"sandbox-providers", "cluster":"Sandbox", "services":[ {"name":"NAMENODE"}, {"name":"JOBTRACKER"}, {"name":"WEBHDFS"}, {"name":"WEBHCAT"}, {"name":"OOZIE"}, {"name":"WEBHBASE"}, {"name":"RESOURCEMANAGER"} ] } Deploy the JSON descriptor to the gateway: curl -iku admin:admin-password https://localhost:8443/gateway/admin/api/v1/descriptors/simple-sandbox -X PUT -H Content-Type:application/json -d "@simple-sandbox_j.json" c. Test the resulting deployed topology curl -iku guest:guest-password 'https://localhost:8443/gateway/simple-sandbox/webhdfs/v1/?op=LISTSTATUS' d. Try to delete the provider configuration (It should be disallowed because simple-sandbox_j.json references it): curl -iku admin:admin-password 'https://localhost:8443/gateway/admin/api/v1/providerconfig/sandbox-providers' -X DELETE e. Delete the referencing descriptor: curl -iku admin:admin-password 'https://localhost:8443/gateway/admin/api/v1/descriptors/simple-sandbox' -X DELETE f. Then, try to delete the provider configuration again. It should succeed this time because there are no referencing descriptors. Summary Hopefully, the benefits of this new functionality are clear. Defining and deploying topologies for Ambari-managed Hadoop clusters is now easier and less error-prone. Provider configurations can now be shared by multiple topologies, reducing duplicate configuration and the associated potential for errors in managing changes. There are related UI enhancements coming soon, which will further ease the management of topologies, and continue the enhancement of Knox's usability. Check out the User Guide for more details about these additions.

pzampino · ‎10-06-2017

This was a big help to me in getting the Docker sandbox up and running. The Sandbox doc says that a bunch of services are started automatically, but alas, they are not. Once I read your instructions for starting the service processes, the mystery was solved. Thanks!

Online	Offline
Last Visited	‎07-16-2024 11:06 AM

Member Since	‎09-08-2017 05:08 PM
Last Visited	‎07-16-2024 11:06 AM
Posts	27
Kudos received	11

Cloudera Community

Re: HDP3.0: knox fails to dispatch webhdfs request...

Re: HDP3.0: knox fails to dispatch webhdfs request...

Re: Access hadoop with knox http 500 error

Re: How to check if specific set of component is i...

Re: HDP3.0: knox fails to dispatch webhdfs request...

Re: HDP3.0: knox fails to dispatch webhdfs request...

Re: HDP3.0: knox fails to dispatch webhdfs request...

Apache Knox Ambari Cluster Monitoring

Re: Installing Docker Version of Sandbox on Mac

Apache Knox Dynamic Service Endpoint Discovery

Re: Installing Docker Version of Sandbox on Mac