Created on 11-09-2021 09:25 PM - edited on 11-09-2021 09:27 PM by subratadas
Livy is a preferred way to run Spark jobs on several Hadoop installations, but not on CDH. While preparing for a CDP migration, one of our use-cases switched to Apache Airflow to run jobs without requiring an edge node or "bastion node" and they wanted to begin using Airflow before the CDP migration, so they asked me to install Livy on a CDH edge node.
A search online for Livy on CDH returned little helpful information, but I did find information on how to download and install it at https://livy.apache.org/
Linux security will allow an application to access or execute any program or file the executing account can access unless you configure selinux or another access management software. Pick an account to run Livy from.
You'll need a Kerberos principal. If you use Active Directory principals with your CDH deployment, then this account will be outside of your Hadoop platform. You can use a tool like ktutil to create a keytab for your Kerberos principal.
Livy requires the basic Hadoop and Spark environment variables.
export JAVA_HOME=/usr/java/default/jre
export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CONF_DIR=/etc/hadoop/conf
export SPARK_HOME=/opt/cloudera/parcels/CDH/lib/spark
cd /var/tmp
wget https://dlcdn.apache.org/incubator/livy/0.7.1-incubating/apache-livy-0.7.1-incubating-bin.zip
unzip apache-livy-0.7.1-incubating-bin.zip
mkdir /opt/livy
mv /var/tmp/apache-livy-0.7.1-incubating-bin /opt/livy/
ln -s /opt/livy/apache-livy-0.7.1-incubating-bin /opt/livy/default
cp /opt/livy/default/conf/livy.conf.template /opt/livy/default/conf/livy.conf
livy.server.launch.kerberos.principal=${KERBEROS_PRINCIPAL}
livy.server.launch.kerberos.keytab=${KERBEROS KEYTAB}
Livy server runs as a background process. This article doesn't discuss how to run it as a service that starts automatically.
/opt/livy/default/bin/livy-server start
You can use one of the recommended test commands from another node:
curl -X POST --data '{"kind": "spark"}' -H "Content-Type: application/json" http://<LIVY_HOST>:8998/sessions
You can also test from a web browser:
http://<LIVY_HOST>:8998
Disclaimer: This article is contributed by an external user. The steps may not be verified by Cloudera and may not be applicable for all use cases and may be very specific to a particular distribution. Please follow with caution and at your own risk. If needed, raise a support case to get the confirmation.
Created on 11-06-2022 09:00 AM - edited 11-06-2022 09:03 AM
I have set up and conducted successful inital testing. My issue now is properly setting up kerberos authentication within Livy.
I have a superuser account and will be adding proxyuser group & server entries for this account in core-site.xml.
What isn't clear is how exactly I set up the kerberos entries in livy.conf. Taking the following example section:
# Authentication support for Livy server
# Livy has a built-in SPnego authentication support for HTTP requests with below configurations.
livy.server.auth.type = kerberos
livy.server.auth.kerberos.principal = HTTP/server@DOMIAN.COM
livy.server.auth.kerberos.keytab = /path/to/http.keytab
livy.server.launch.kerberos.principal = superuser/server@DOMAIN.COM
livy.server.launch.kerberos.keytab = /path/to/superuser.keytab
1) Is the HTTP principal required or can I simply repeat the superuser principal & keytab?
2) If the HTTP principal is needed, how do you go about and create it under CDH 6.x?
Created on 11-07-2022 02:09 AM
Hi @PNCJeff
I would recommend installing and using Livy Server in the CDP cluster.
For Livy Kerberos configuration parameters are below:
livy.server.launch.kerberos.keytab=<LIVY_SERVER_PATH>/livy.keytab
livy.server.launch.kerberos.principal=livy/server@DOMAIN.COM
livy.server.auth.type=kerberos
livy.server.auth.kerberos.keytab=<LIVY_SERVER_PATH>/livy.keytab
livy.server.auth.kerberos.principal=HTTP/server@DOMAIN.COM
livy.server.auth.kerberos.name-rules=RULE:[2:$1@$0](rangeradmin@DOMAIN.COM)s/(.*)@DOMAIN.COM/ranger/\u000ARULE:[2:$1@$0](rangertagsync@DOMAIN.COM)s/(.*)@DOMAIN.COM/rangertagsync/\u000ARULE:[2:$1@$0](rangerusersync@DOMAIN.COM)s/(.*)@DOMAIN.COM/rangerusersync/\u000ARULE:[2:$1@$0](rangerkms@DOMAIN.COM)s/(.*)@DOMAIN.COM/keyadmin/\u000ARULE:[2:$1@$0](atlas@DOMAIN.COM)s/(.*)@DOMAIN.COM/atlas/\u000ADEFAULT\u000A