Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to make Zeppelin's User Impersonation work with Kerberos in HDP 2.6?

avatar
Expert Contributor

I'm trying to make the Zeppelin Notebook run as the logged user for the %sh and %spark interpreters when using centralized users provided by combining LDAP + Kerberos with SSSD.

I was able to make this work in a NON Kerberized cluster by using the steps suggested in the following links:

However this doen't work in a Kerberized cluster with users identity/authentication handled by SSSD form LDAP+ Kerberos.

The problem is the "hack" used in Zeppelin to run as the requeste user is to put the "zeppelin" user in the "sudoers", and define the variable ZEPPELIN_IMPERSONATE_CMD so it will include a

sudo su - ${ZEPPELIN_IMPERSONATE_USER} bash -c

before the execution of the interpreter.

The problem with this is because the initial login in Kerberos is done using LDAP so no Kerberos ticket is issued, and later by using "sudo" from a privileged user you turn into the requested user, but as you are not providing any password you are not hitting the "authentication" stage of SSSD and so you are not doing the "kinit" needed to contact the Kerberos KDC to get the user's Kerberor TGT (granting ticket).
For this reason the local comands on the Linux host running Zeppelin will work, but if you try to execute any command on the kerberized cluster from %sh interpreter, as for example "hdfs dfs -ls" or "yarn application -list", IT WILL FAIL telling you don't have the required TGT tickets:

javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]

The same happens when using the %spark2 interpreter for the vary same reason.

The same problem happens if you log into the edge server as root and do a "su - <user>" to a regular user, but in this case I can execute a "kinit" manually and provide the credentials to get the Kerberos tickets. After this all works as expected.

The only fix I was able to find, is to request the user to login with ssh to the Zeppelin server in order to provide his password and get the TGT. After this the Zeppelin Impersonation will work (once the ticket is validated with kerberos any session will share the ticket).

I guess this may work I the login to Zeppelin (using Apache Shiro) would be done using Kerberos instead of LDAP, because this would launch the required Kinit process, but I was not able to find any documentation on how to do that.

Does anybody knows what is the way of making this to work when using Kerberos with LDAP and MIT KDC (not AD)?

Best regards.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

This feature is not (well) documented anywere and the intructions for user impersonation in Zeppelin's manual only works for shell or for spark when not using Kerberos, so I will respond myself showing how I was able to make this work with Kerberos and HDP 3.1 (zeppelin 0.8.0).

First you DON'T have to change/uncoment ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER in zeppelin-env and should leave this with the default value "true" (meaning that Zeppelin will use --proxy-user option when impersonation is enabled in the spark2 interpreter).

The after you kerberized the cluster you will have to edit the Spark2 interpreter and change the following:

  • Set the interpeter to be instantiated "Per User | Isolated" and select the "User Impersonate" checkbox.
  • Remove the following properties from the interpreter configuration:

 

spark.yarn.keytab=/etc/security/keytabs/zeppelin.server.kerberos.keytab 
spark.yarn.principal=zeppelin-mycluster@MYREALM.COM

 

  •  Add the following properties to the interpreter (fix domain and kerberos realm's names) :

 

zeppelin.spark.keytab=/etc/security/keytabs/zeppelin.server.kerberos.keytab
zeppelin.spark.principal=zeppelin-mycluster@MYREALM.COM

 

  • Save and restart the interpreter

After you should be able to run your %spark2.* jobs from Zeppelin as the logger user.

 

 

View solution in original post

7 REPLIES 7

avatar
Expert Contributor

@Geoffrey Shelton Okot
Thank you for your response. I agree with you that by default Zeppelin is designed to run as a single user (usually named zeppelin), but in the official documentation for version 0.7.0 (the first link is provided above) they state they are including support to the USER IMPERSONATION (as opposed to the User Proxy settings used in the connection oriented interpreters like %jdbc(hive) or %livy) for use with some execution environment interpreters like %sh and %spark/%spark2.

This approach is more or less well documented and I it works OK with local or centralized users "without kerberos" by using the ZEPPELIN_IMPERSONATE_USER variable (defined at login time) and the ZEPPELIN_IMPERSONATE_CMD hook in the zeppelin-env.sh file (and also setting ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER=false for not using proxy in this case for Spark).

The problem is that this approach doesn't fit well with LDAP/Kerberos (I really don't know if it may work with direct Active Directory login into Zeppelin) because the LDAP login doesn't use Kerberos (or at least I haven't found how to do it) and for this reason the impersonated user don't have a valid ticket when the zeppelin user turns into him (by using sudo) to launch the shell interpreters.

For what I have checked in the Zeppelin 0.7.0's Apache Shiro documentation, there is no support for direct authentication against Kerberos yet (only for LDAP, AD or PAM) so It seems it is not possible to solve this problem at this moment - except maybe by using AD and not LDAP+Kerberos.

I will test the support of Spark by using %livy with proxy user configuration, and this may be a partial patch; but pitifully this is not a fully satisfactory solution for my deployment; because my users need to be able to run shell commands like "hdfs ...", "yarn ...", etc from the shell interpreter with their authenticated users (not the zeppelin one), and currently this doesn't seems to be possible with this Zeppelin version when using a kerberized cluster.

avatar

I am also having the similar problem , My user login into Zeppelin notenook and authenticated by LDAP .

After that they want to use 1)%livy2.pyspark 2)%livy2.sql with their login user no zeppelin.

But when i enable user impersonation in livy, It fails

My error is

org.apache.zeppelin.livy.LivyException: {"msg":"User 'zeppelin-goa_datalake_1' not allowed to impersonate 'Some(sameer.dalai)'."}

org.springframework.web.client.HttpClientErrorException: 403 Forbidden

at org.springframework.web.client.DefaultResponseErrorHandler.handleError(DefaultResponseErrorHandler.java:91)

at org.springframework.web.client.RestTemplate.handleResponse(RestTemplate.java:667)

at org.springframework.web.client.RestTemplate.doExecute(RestTemplate.java:620)

at org.springframework.security.kerberos.client.KerberosRestTemplate.doExecuteSubject(KerberosRestTemplate.java:202)

at org.springframework.security.kerberos.client.KerberosRestTemplate.access$100(KerberosRestTemplate.java:67)

at org.springframework.security.kerberos.client.KerberosRestTemplate$1.run(KerberosRestTemplate.java:191)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:360)


How do i ensure to enable data scientist to use zeppelin to submit spark jobs using livy interpreter ?

avatar
Expert Contributor

This feature is not (well) documented anywere and the intructions for user impersonation in Zeppelin's manual only works for shell or for spark when not using Kerberos, so I will respond myself showing how I was able to make this work with Kerberos and HDP 3.1 (zeppelin 0.8.0).

First you DON'T have to change/uncoment ZEPPELIN_IMPERSONATE_SPARK_PROXY_USER in zeppelin-env and should leave this with the default value "true" (meaning that Zeppelin will use --proxy-user option when impersonation is enabled in the spark2 interpreter).

The after you kerberized the cluster you will have to edit the Spark2 interpreter and change the following:

  • Set the interpeter to be instantiated "Per User | Isolated" and select the "User Impersonate" checkbox.
  • Remove the following properties from the interpreter configuration:

 

spark.yarn.keytab=/etc/security/keytabs/zeppelin.server.kerberos.keytab 
spark.yarn.principal=zeppelin-mycluster@MYREALM.COM

 

  •  Add the following properties to the interpreter (fix domain and kerberos realm's names) :

 

zeppelin.spark.keytab=/etc/security/keytabs/zeppelin.server.kerberos.keytab
zeppelin.spark.principal=zeppelin-mycluster@MYREALM.COM

 

  • Save and restart the interpreter

After you should be able to run your %spark2.* jobs from Zeppelin as the logger user.

 

 

avatar
Contributor

The provided work-around is tested and works on HDP 3.1; Keep in mind in our case Zeppelin reverts to default interpreter settings after restart of the service;

avatar
New Contributor

To prevent that ambari resets the interpreter config (on zeppelin restart), set the following property to false in Advanced zeppelin-site:

zeppelin.interpreter.config.upgrade=false

 

avatar
New Contributor

Does the solution work wiht %sh as well ? Please elaborate.

avatar
New Contributor

Hi,
I am able to achieve user impersonation with the solution provided by you. Now my spark jobs are getting stuck and keeps on running for hours after that I get an error of GSSException: Failed to find TGT to connect zeppelin host to yarn RM host. 
When I removed the values from zeppelin.spark.keytab, zeppelin.spark.principal and put the values back in spark.yarn.keytab and spark.yarn.principal then my jobs run fine without any error but zeppelin impersonation fails in this case. What can I do to achieve user impersonation as well as get free from GSSException error for yarn RM.