Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Contributor

Installing and configuring Livy on CDH 6.x.x

Livy is a preferred way to run Spark jobs on several Hadoop installations, but not on CDH. While preparing for a CDP migration, one of our use-cases switched to Apache Airflow to run jobs without requiring an edge node or "bastion node" and they wanted to begin using Airflow before the CDP migration, so they asked me to install Livy on a CDH edge node.

 

A search online for Livy on CDH returned little helpful information, but I did find information on how to download and install it at https://livy.apache.org/

Step 1: Determine which account will be used to run Livy

Linux security will allow an application to access or execute any program or file the executing account can access unless you configure selinux or another access management software. Pick an account to run Livy from.

Step 2: Set up a keytab

You'll need a Kerberos principal. If you use Active Directory principals with your CDH deployment, then this account will be outside of your Hadoop platform. You can use a tool like ktutil to create a keytab for your Kerberos principal.  

Step 3: Set up your server to run Livy

Livy requires the basic Hadoop and Spark environment variables.

 

export JAVA_HOME=/usr/java/default/jre
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark

 

Step 4: Download and install Livy

  1. Download the Livy package zip file from https://livy.apache.org/download/ using wget.
    Example:
    cd /var/tmp
    wget https://dlcdn.apache.org/incubator/livy/0.7.1-incubating/apache-livy-0.7.1-incubating-bin.zip
  2. Unzip the resulting zip file
    unzip apache-livy-0.7.1-incubating-bin.zip
  3. Deploy the package and link the default symlink directory
    mkdir /opt/livy
    mv /var/tmp/apache-livy-0.7.1-incubating-bin /opt/livy/
    ln -s /opt/livy/apache-livy-0.7.1-incubating-bin /opt/livy/default

Step 5: Set up livy.conf

  1. Livy places template files that you'll need to copy to "real" files. You need to configure livy.conf.
    cp /opt/livy/default/conf/livy.conf.template /opt/livy/default/conf/livy.conf
  2. Edit the newly created livy.conf file and add two lines in the commented-out Kerberos section.
    livy.server.launch.kerberos.principal=${KERBEROS_PRINCIPAL}
    livy.server.launch.kerberos.keytab=${KERBEROS KEYTAB}
  3. Specify the full Kerberos principal name and the full path to the keytab.

Step 6: Run the Livy server

Livy server runs as a background process. This article doesn't discuss how to run it as a service that starts automatically.

 

/opt/livy/default/bin/livy-server start

 

Step 7: Test Livy

You can use one of the recommended test commands from another node:

 

curl  -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" http://<LIVY_HOST>:8998/sessions​

 

You can also test from a web browser:

 

http://<LIVY_HOST>:8998

 

Disclaimer: This article is contributed by an external user. The steps may not be verified by Cloudera and may not be applicable for all use cases and may be very specific to a particular distribution. Please follow with caution and at your own risk. If needed, raise a support case to get the confirmation.

1,191 Views
0 Kudos
Comments
avatar
New Contributor

I have set up and conducted successful inital testing.  My issue now is properly setting up kerberos authentication within Livy.

 

I have a superuser account and will be adding proxyuser group & server entries for this account in core-site.xml.

 

What isn't clear is how exactly I set up the kerberos entries in livy.conf.  Taking the following example section:

# Authentication support for Livy server

# Livy has a built-in SPnego authentication support for HTTP requests  with below configurations.

livy.server.auth.type = kerberos

livy.server.auth.kerberos.principal = HTTP/server@DOMIAN.COM

livy.server.auth.kerberos.keytab = /path/to/http.keytab

livy.server.launch.kerberos.principal =  superuser/server@DOMAIN.COM

livy.server.launch.kerberos.keytab = /path/to/superuser.keytab

 

1) Is the HTTP principal required or can I simply repeat the superuser principal & keytab?

2) If the HTTP principal is needed, how do you go about and create it under CDH 6.x?

 

avatar
Super Collaborator

Hi @PNCJeff 

I would recommend installing and using Livy Server in the CDP cluster. 

 

For Livy Kerberos configuration parameters are below:

livy.server.launch.kerberos.keytab=<LIVY_SERVER_PATH>/livy.keytab
livy.server.launch.kerberos.principal=livy/server@DOMAIN.COM
livy.server.auth.type=kerberos
livy.server.auth.kerberos.keytab=<LIVY_SERVER_PATH>/livy.keytab
livy.server.auth.kerberos.principal=HTTP/server@DOMAIN.COM
livy.server.auth.kerberos.name-rules=RULE:[2:$1@$0](rangeradmin@DOMAIN.COM)s/(.*)@DOMAIN.COM/ranger/\u000ARULE:[2:$1@$0](rangertagsync@DOMAIN.COM)s/(.*)@DOMAIN.COM/rangertagsync/\u000ARULE:[2:$1@$0](rangerusersync@DOMAIN.COM)s/(.*)@DOMAIN.COM/rangerusersync/\u000ARULE:[2:$1@$0](rangerkms@DOMAIN.COM)s/(.*)@DOMAIN.COM/keyadmin/\u000ARULE:[2:$1@$0](atlas@DOMAIN.COM)s/(.*)@DOMAIN.COM/atlas/\u000ADEFAULT\u000A