Support Questions

Find answers, ask questions, and share your expertise

Zeppelin, Livy, Hive, Kerberos & Spark 1.6

avatar
New Contributor

When using Zeppelin with Livy on a kerberized CDH 5.10 cluster and trying to access Hive, I ran into this

 

https://issues.apache.org/jira/browse/SPARK-13478

 

Since Hive on Spark is not supported on Spark 2.0 in CDH 5.10, only on Spark 1.6, when will the fix be backported please?

 

The fix is available for Spark 1.6.4, so one option would be to upgrade Hive on Spark to support Spark 1.6.4. What is your timeline to do this?

 

Also, when will you provide packaging for LIvy?

thank you

 

1 ACCEPTED SOLUTION

avatar
New Contributor

Solved this.

 

Hive from Zeppelin w. Livy & Spark2

 

Add spark2 to the kerberized cluster

https://www.cloudera.com/downloads/spark2/2-0.html

 

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_addon_services.html#concept_qb...

 

Adapt livy to recognize spark2 if required

https://community.cloudera.com/t5/Web-UI-Hue-Beeswax/Spark-2-0-livy-server-3/td-p/48562

 

#build livy for spark2
mvn clean package -DskipTests -Dspark-2.0 -Dscala-2.11

#make livy hive aware
sudo ln -s /etc/hive/conf/hive-site.xml /etc/spark/conf/hive-site.xml

#make a livy user
sudo useradd livy

#create a logs dir
mkdir logs

#change the ownership of everything
sudo chown -R livy:livy ./*

#start livy with the right config (see below)
sudo -u livy /opt/livy/bin/livy-server

livy-env.sh
===============================================
export SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2
export HADOOP_HOME=/opt/cloudera/parcels/CDH
export SPARK_CONF_DIR=/etc/spark/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf

livy.conf
===============================================
# What spark master Livy sessions should use.
livy.spark.master = yarn

# What spark deploy mode Livy sessions should use.
livy.spark.deployMode = cluster

# If livy should impersonate the requesting users when creating a new session.
livy.impersonation.enabled = true

# Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected
# on user request and then livy server classpath automatically.
livy.repl.enableHiveContext = true

livy.server.launch.kerberos.keytab = /opt/livy/livy.keytab
livy.server.launch.kerberos.principal=livy/server.fqdn@XXX

livy.impersonation.enabled = true
livy.server.auth.type = kerberos

livy.server.auth.kerberos.keytab=/opt/livy/spnego.keytab
livy.server.auth.kerberos.principal=HTTP/server.fqdn@XXX

livy.server.access_control.enabled=true
livy.server.access_control.users=zeppelin,livy

livy.superusers=zeppelin,livy

 

You must also configure Zeppelin for kerberos auth:

zeppelin.livy.keytab	/opt/zeppelin/zeppelin.keytab
zeppelin.livy.principal	zeppelin@XXX
zeppelin.livy.url	http://host.fqdn:8998

Finally you must configure shiro, I used an LDAP backend with org.apache.zeppelin.realm.LdapGroupRealm plugin to enable LDAP user group awareness (take care to set up groupOfNames not posixGroup...).

 

 

 

 

View solution in original post

4 REPLIES 4

avatar
New Contributor

Solved this.

 

Hive from Zeppelin w. Livy & Spark2

 

Add spark2 to the kerberized cluster

https://www.cloudera.com/downloads/spark2/2-0.html

 

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_addon_services.html#concept_qb...

 

Adapt livy to recognize spark2 if required

https://community.cloudera.com/t5/Web-UI-Hue-Beeswax/Spark-2-0-livy-server-3/td-p/48562

 

#build livy for spark2
mvn clean package -DskipTests -Dspark-2.0 -Dscala-2.11

#make livy hive aware
sudo ln -s /etc/hive/conf/hive-site.xml /etc/spark/conf/hive-site.xml

#make a livy user
sudo useradd livy

#create a logs dir
mkdir logs

#change the ownership of everything
sudo chown -R livy:livy ./*

#start livy with the right config (see below)
sudo -u livy /opt/livy/bin/livy-server

livy-env.sh
===============================================
export SPARK_HOME=/opt/cloudera/parcels/SPARK2/lib/spark2
export HADOOP_HOME=/opt/cloudera/parcels/CDH
export SPARK_CONF_DIR=/etc/spark/conf
export HADOOP_CONF_DIR=/etc/hadoop/conf

livy.conf
===============================================
# What spark master Livy sessions should use.
livy.spark.master = yarn

# What spark deploy mode Livy sessions should use.
livy.spark.deployMode = cluster

# If livy should impersonate the requesting users when creating a new session.
livy.impersonation.enabled = true

# Whether to enable HiveContext in livy interpreter, if it is true hive-site.xml will be detected
# on user request and then livy server classpath automatically.
livy.repl.enableHiveContext = true

livy.server.launch.kerberos.keytab = /opt/livy/livy.keytab
livy.server.launch.kerberos.principal=livy/server.fqdn@XXX

livy.impersonation.enabled = true
livy.server.auth.type = kerberos

livy.server.auth.kerberos.keytab=/opt/livy/spnego.keytab
livy.server.auth.kerberos.principal=HTTP/server.fqdn@XXX

livy.server.access_control.enabled=true
livy.server.access_control.users=zeppelin,livy

livy.superusers=zeppelin,livy

 

You must also configure Zeppelin for kerberos auth:

zeppelin.livy.keytab	/opt/zeppelin/zeppelin.keytab
zeppelin.livy.principal	zeppelin@XXX
zeppelin.livy.url	http://host.fqdn:8998

Finally you must configure shiro, I used an LDAP backend with org.apache.zeppelin.realm.LdapGroupRealm plugin to enable LDAP user group awareness (take care to set up groupOfNames not posixGroup...).

 

 

 

 

avatar
Expert Contributor

Hi,

 

Something to be careful is when you do "Deploy Client Configuration" on your Spark2 service it will remove the link or the hive-site.xml if you have copied it.

 

I have noticed all these config are in $SPARK_CONF_DIR/yarn-conf/ so I wish Livy could also load them when it starts up the Spark. 

 


 

avatar
Expert Contributor
OK, I have tried and it seems it's best to copy hive-site.xml into livy/conf/ and it will load it in every session.

Best,

avatar
New Contributor
Hi,

Set this in livy-env.sh instead to get it working in a more maintainable way:
export HADOOP_CONF_DIR=/etc/hive/conf