Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Update 10.12.2016: Added filter to rewrite proxyUser as authenticated user Update 25.01.2017: Improved service.xml and rewrite.xml

1 Configure a new service for Livy Server in Knox

1.1 Create a service definition

$ sudo mkdir -p /usr/hdp/current/knox-server/data/services/livy/0.1.0/ 
$ sudo chown -R knox:knox /usr/hdp/current/knox-server/data/services/livy

Create a file /usr/hdp/current/knox-server/data/services/livy/0.1.0/service.xml with

<service role="LIVYSERVER" name="livy" version="0.1.0">
  <routes>
    <route path="/livy/v1/sessions">
        <rewrite apply="LIVYSERVER/livy/addusername/inbound" to="request.body"/>
    </route>
    <route path="/livy/v1/**?**"/>
    <route path="/livy/v1"/>
    <route path="/livy/v1/"/>
  </routes>
</service>

Note that the name "livy" attribute and the path .../services/livy/... need to be the same.

The route /livy/v1/sessions is a special treatment for the POST request to create a Livy session. The request body e.g. looks like:

{"driverMemory":"2G","executorCores":4,"executorMemory":"8G","proxyUser":"bernhard","conf":{"spark.master":"yarn-cluster","spark.jars.packages":"com.databricks:spark-csv_2.10:1.5.0"}

Livy server will use proxUser to run the Spark session. To avoid that a user can provide here any user (e.g. a more privileged), Knox will need to rewrite the the json body to replace what so ever is the value of proxyUser is with the username of the authenticated user, see next section.

1.2 Create a rewrite rule definition

Create a file /usr/hdp/current/knox-server/data/services/livy/0.1.0/rewrite.xml with

<rules>
  <rule name="LIVYSERVER/livy/user-name">
    <rewrite template="{$username}"/>
  </rule>
  <rule dir="IN" name="LIVYSERVER/livy/root/inbound" pattern="*://*:*/**/livy/v1">
    <rewrite template="{$serviceUrl[LIVYSERVER]}"/>
  </rule>
  <rule dir="IN" name="LIVYSERVER/livy/path/inbound" pattern="*://*:*/**/livy/v1/{path=**}?{**}">
    <rewrite template="{$serviceUrl[LIVYSERVER]}/{path=**}?{**}"/>
  </rule>
  <filter name="LIVYSERVER/livy/addusername/inbound">
    <content type="*/json">
      <apply path="$.proxyUser" rule="LIVYSERVER/livy/user-name"/>
    </content>
  </filter>
</rules>

Note: The "v1" is only introduced to allow calls to Livy-server without any path. (Seems to be a limitation in Knox that at least one path element nees to be present in mapped URL.

The rule LIVYSERVER/livy/user-name and the filter LIVYSERVER/livy/addusername/inbound are responsible to inject the authenticated user name as described in the last section.

1.3 Publish the new service via Ambari

Goto Knox Configuration and add at the end of Advanced Topology:

    <topology>
    ...
        <service>
            <role>LIVYSERVER</role>
            <url>http://<livy-server>:8998</url>
        </service>
    </topology>

2 Use with Sparkmagic

Sparkmagic can be configured to use Livy Server in HDP 2.5, see Using Jupyter with Sparkmagic and Livy Server on HDP 2.5

To connect via Knox, change endpoint definition in %manage_spark

10184-knox-add-endpoint.png

Just replace Livy server URL with Knox gateway URL https://<knox-gateway>:8443/livy/v1

If Knox does not have a valid certificate for HTTPS requests, reconfigure Sparkmagic's config.json end set

 "ignore_ssl_errors": false

Credits

Thanks to Kevin Minder for the article About Adding a service to Apache Knox

15,819 Views
Comments
avatar

@Bernhard Walter @Kevin Minder Thanks for sharing this. Works Great. But the livy job is getting submitted as "knox". I did looked into the section rewrite rule definition, and I have everything like that in my rewrite.xml, but still its getting submitted as "knox".

Is there any additional things needed to be done at the knox side to submit livy job as user who submits the job not as "knox" ?

avatar

What we see in a kerberised cluster is that Livy needs to be able to impersonate other roles.

Then when Know forwards the request to Livy with "doAs=<authc-user>", livy starts job as the authenticated user.

To be on the safe side, the knox rewrite rule also replaces the proxyUser with the authenticated user

avatar

@Bernhard Walter @Kevin Minder Thanks for sharing this article. I followed the same steps and was able to configure it properly but i am getting HTTP 401 error while querying it. Please find the output below for your reference:-

spark@XXXXXXX(/home/spark)$ curl --negotiate -X GET -u : "http://xxxxxx:8998/" <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <title>Metrics</title> </head> <body> <h1>Operational Menu</h1> <ul> <li><a href="/metrics?pretty=true">Metrics</a></li> <li><a href="/ping">Ping</a></li> <li><a href="/threads">Threads</a></li> <li><a href="/healthcheck?pretty=true">Healthcheck</a></li> </ul> </body> </html>

This works without any issue.

spark@xxxxxxx(/home/spark)$ curl --negotiate -X GET -u : "http://xxxxxxx:8443/gateway/default/livy/v1"

This results into HTTP 401 (found in knox gateway audit logs).

Am i missing something. Please help me in fixing this.

avatar

Maybe check whether you can access WebHDFS via Knox to see if your kinit user is accepted by Knox

avatar

Hi @Bernhard Walter, First of all thanks for your help.

You are right. I wasn't able to access webhdfs via knox. It seems our Web UI's and HTTP requests are still using simple/anonymous authentication and Knox is configured to use kerberos so any request to webhdfs, livy and hbase were failing with HTTP 401 error.

However we are able to make HTTP requests to hive and webhcat. Do you have any idea why?

avatar
Contributor

Could you expand on what configurations are required for knox to automatically send the authenticated user as a proxyuser to the call?

For example,

curl -u user:password -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" https://hostname:8443/gateway/default/livy/v1/sessions -k

creates a livy process with proxyUser: knox, while:

curl -u user:password -X POST --data '{"kind": "spark", "proxyUser": "user"}' -H "Content-Type: application/json" https://hostname:8443/gateway/default/livy/v1/sessions -k

creates a livy process with proxyUser: user, but here I specifically point to "user" as the proxyUser in the call. Is it possible for knox to do this automatically?

I've followed the configs from the article, knox user is set as livy superuser and is added the same way I've added hue to the configs, the latter of which sends the correct proxy settings.

avatar
Rising Star

KNOX-1098 is the support for adding proxyUser when it is not there. This hasn't been merged yet.

avatar
New Contributor

Hello what do you mean by "knox user is set as livy superuser"? With this setup I end with an error saying that user knox is not allowed to impersonate the proxyUser given in the JSON payload. However knox is properly configured for other usages (WbHDFS, etc.) and Livy is able to impersonate properly. Do you have any clue on this error?

avatar
Contributor

I have the following issue with this setup .

.I define Livy service on a Knox topology with authentication provider enabled . 

When  I request the Livy session over Knox url

Knox requests  the Livy session with doAs  =  myuser . So far so good. .. 

Livy sessions is started with owner=Knox and  proxyuser =myuser.. 

Problem is when we attempt to post to Livy statements API  over the Knox url. 

If we use the Knox url for posting to the running Livy session  Knox will add the doAs=myuser . But now  we get a forbidden response . Basically because the Livy session is owned by Knox we cannot post statement into the session over the Knox url with doAs=myuser . in my setup at least only the Knox user may post a statement  to a Livy session owned by Knox .