Member since
12-15-2015
15
Posts
1
Kudos Received
1
Solution
11-09-2021
09:25 PM
Installing and configuring Livy on CDH 6.x.x
Livy is a preferred way to run Spark jobs on several Hadoop installations, but not on CDH. While preparing for a CDP migration, one of our use-cases switched to Apache Airflow to run jobs without requiring an edge node or "bastion node" and they wanted to begin using Airflow before the CDP migration, so they asked me to install Livy on a CDH edge node.
A search online for Livy on CDH returned little helpful information, but I did find information on how to download and install it at https://livy.apache.org/
Step 1: Determine which account will be used to run Livy
Linux security will allow an application to access or execute any program or file the executing account can access unless you configure selinux or another access management software. Pick an account to run Livy from.
Step 2: Set up a keytab
You'll need a Kerberos principal. If you use Active Directory principals with your CDH deployment, then this account will be outside of your Hadoop platform. You can use a tool like ktutil to create a keytab for your Kerberos principal.
Step 3: Set up your server to run Livy
Livy requires the basic Hadoop and Spark environment variables.
export JAVA_HOME=/usr/java/default/jre
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
Step 4: Download and install Livy
Download the Livy package zip file from https://livy.apache.org/download/ using wget. Example: cd /var/tmp
wget https://dlcdn.apache.org/incubator/livy/0.7.1-incubating/apache-livy-0.7.1-incubating-bin.zip
Unzip the resulting zip file unzip apache-livy-0.7.1-incubating-bin.zip
Deploy the package and link the default symlink directory mkdir /opt/livy
mv /var/tmp/apache-livy-0.7.1-incubating-bin /opt/livy/
ln -s /opt/livy/apache-livy-0.7.1-incubating-bin /opt/livy/default
Step 5: Set up livy.conf
Livy places template files that you'll need to copy to "real" files. You need to configure livy.conf. cp /opt/livy/default/conf/livy.conf.template /opt/livy/default/conf/livy.conf
Edit the newly created livy.conf file and add two lines in the commented-out Kerberos section. livy.server.launch.kerberos.principal=${KERBEROS_PRINCIPAL}
livy.server.launch.kerberos.keytab=${KERBEROS KEYTAB}
Specify the full Kerberos principal name and the full path to the keytab.
Step 6: Run the Livy server
Livy server runs as a background process. This article doesn't discuss how to run it as a service that starts automatically.
/opt/livy/default/bin/livy-server start
Step 7: Test Livy
You can use one of the recommended test commands from another node:
curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" http://<LIVY_HOST>:8998/sessions
You can also test from a web browser:
http://<LIVY_HOST>:8998
Disclaimer: This article is contributed by an external user. The steps may not be verified by Cloudera and may not be applicable for all use cases and may be very specific to a particular distribution. Please follow with caution and at your own risk. If needed, raise a support case to get the confirmation.
... View more
Labels: