Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Update 10.12.2016: Added filter to rewrite proxyUser as authenticated user Update 25.01.2017: Improved service.xml and rewrite.xml

1 Configure a new service for Livy Server in Knox

1.1 Create a service definition

$ sudo mkdir -p /usr/hdp/current/knox-server/data/services/livy/0.1.0/ 
$ sudo chown -R knox:knox /usr/hdp/current/knox-server/data/services/livy

Create a file /usr/hdp/current/knox-server/data/services/livy/0.1.0/service.xml with

<service role="LIVYSERVER" name="livy" version="0.1.0">
  <routes>
    <route path="/livy/v1/sessions">
        <rewrite apply="LIVYSERVER/livy/addusername/inbound" to="request.body"/>
    </route>
    <route path="/livy/v1/**?**"/>
    <route path="/livy/v1"/>
    <route path="/livy/v1/"/>
  </routes>
</service>

Note that the name "livy" attribute and the path .../services/livy/... need to be the same.

The route /livy/v1/sessions is a special treatment for the POST request to create a Livy session. The request body e.g. looks like:

{"driverMemory":"2G","executorCores":4,"executorMemory":"8G","proxyUser":"bernhard","conf":{"spark.master":"yarn-cluster","spark.jars.packages":"com.databricks:spark-csv_2.10:1.5.0"}

Livy server will use proxUser to run the Spark session. To avoid that a user can provide here any user (e.g. a more privileged), Knox will need to rewrite the the json body to replace what so ever is the value of proxyUser is with the username of the authenticated user, see next section.

1.2 Create a rewrite rule definition

Create a file /usr/hdp/current/knox-server/data/services/livy/0.1.0/rewrite.xml with

<rules>
  <rule name="LIVYSERVER/livy/user-name">
    <rewrite template="{$username}"/>
  </rule>
  <rule dir="IN" name="LIVYSERVER/livy/root/inbound" pattern="*://*:*/**/livy/v1">
    <rewrite template="{$serviceUrl[LIVYSERVER]}"/>
  </rule>
  <rule dir="IN" name="LIVYSERVER/livy/path/inbound" pattern="*://*:*/**/livy/v1/{path=**}?{**}">
    <rewrite template="{$serviceUrl[LIVYSERVER]}/{path=**}?{**}"/>
  </rule>
  <filter name="LIVYSERVER/livy/addusername/inbound">
    <content type="*/json">
      <apply path="$.proxyUser" rule="LIVYSERVER/livy/user-name"/>
    </content>
  </filter>
</rules>

Note: The "v1" is only introduced to allow calls to Livy-server without any path. (Seems to be a limitation in Knox that at least one path element nees to be present in mapped URL.

The rule LIVYSERVER/livy/user-name and the filter LIVYSERVER/livy/addusername/inbound are responsible to inject the authenticated user name as described in the last section.

1.3 Publish the new service via Ambari

Goto Knox Configuration and add at the end of Advanced Topology:

    <topology>
    ...
        <service>
            <role>LIVYSERVER</role>
            <url>http://<livy-server>:8998</url>
        </service>
    </topology>

2 Use with Sparkmagic

Sparkmagic can be configured to use Livy Server in HDP 2.5, see Using Jupyter with Sparkmagic and Livy Server on HDP 2.5

To connect via Knox, change endpoint definition in %manage_spark

10184-knox-add-endpoint.png

Just replace Livy server URL with Knox gateway URL https://<knox-gateway>:8443/livy/v1

If Knox does not have a valid certificate for HTTPS requests, reconfigure Sparkmagic's config.json end set

 "ignore_ssl_errors": false

Credits

Thanks to Kevin Minder for the article About Adding a service to Apache Knox

5,834 Views
Comments
New Contributor

@Bernhard Walter @Kevin Minder Thanks for sharing this. Works Great. But the livy job is getting submitted as "knox". I did looked into the section rewrite rule definition, and I have everything like that in my rewrite.xml, but still its getting submitted as "knox".

Is there any additional things needed to be done at the knox side to submit livy job as user who submits the job not as "knox" ?

What we see in a kerberised cluster is that Livy needs to be able to impersonate other roles.

Then when Know forwards the request to Livy with "doAs=<authc-user>", livy starts job as the authenticated user.

To be on the safe side, the knox rewrite rule also replaces the proxyUser with the authenticated user

New Contributor

@Bernhard Walter @Kevin Minder Thanks for sharing this article. I followed the same steps and was able to configure it properly but i am getting HTTP 401 error while querying it. Please find the output below for your reference:-

spark@XXXXXXX(/home/spark)$ curl --negotiate -X GET -u : "http://xxxxxx:8998/" <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Metrics</title> </head> <body> <h1>Operational Menu</h1> <ul> <li><a href="/metrics?pretty=true">Metrics</a></li> <li><a href="/ping">Ping</a></li> <li><a href="/threads">Threads</a></li> <li><a href="/healthcheck?pretty=true">Healthcheck</a></li> </ul> </body> </html>

This works without any issue.

spark@xxxxxxx(/home/spark)$ curl --negotiate -X GET -u : "http://xxxxxxx:8443/gateway/default/livy/v1"

This results into HTTP 401 (found in knox gateway audit logs).

Am i missing something. Please help me in fixing this.

Maybe check whether you can access WebHDFS via Knox to see if your kinit user is accepted by Knox

New Contributor

Hi @Bernhard Walter, First of all thanks for your help.

You are right. I wasn't able to access webhdfs via knox. It seems our Web UI's and HTTP requests are still using simple/anonymous authentication and Knox is configured to use kerberos so any request to webhdfs, livy and hbase were failing with HTTP 401 error.

However we are able to make HTTP requests to hive and webhcat. Do you have any idea why?

New Contributor

Could you expand on what configurations are required for knox to automatically send the authenticated user as a proxyuser to the call?

For example,

curl -u user:password -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" https://hostname:8443/gateway/default/livy/v1/sessions -k

creates a livy process with proxyUser: knox, while:

curl -u user:password -X POST --data '{"kind": "spark", "proxyUser": "user"}' -H "Content-Type: application/json" https://hostname:8443/gateway/default/livy/v1/sessions -k

creates a livy process with proxyUser: user, but here I specifically point to "user" as the proxyUser in the call. Is it possible for knox to do this automatically?

I've followed the configs from the article, knox user is set as livy superuser and is added the same way I've added hue to the configs, the latter of which sends the correct proxy settings.

New Contributor

KNOX-1098 is the support for adding proxyUser when it is not there. This hasn't been merged yet.

New Contributor

Hello what do you mean by "knox user is set as livy superuser"? With this setup I end with an error saying that user knox is not allowed to impersonate the proxyUser given in the JSON payload. However knox is properly configured for other usages (WbHDFS, etc.) and Livy is able to impersonate properly. Do you have any clue on this error?

Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 07:32 AM
Updated by:
 
Contributors
Top Kudoed Authors