Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Zeppelin - HDP 3.1 - Spark integration with Hive using HWC not working in Kerberized cluster

Highlighted

Zeppelin - HDP 3.1 - Spark integration with Hive using HWC not working in Kerberized cluster

Expert Contributor

Has anybody been able to make Spark-Hive integration using Hive Warehouse Connector work in Zeppelin with impersonation in a Kerberized cluster?

I have followed the steps in this article

and Spark works with the HWC using spark-shell or pyspark from the console, but Spark in Zeppelin will not work when using impersonation with the given instructions, neither with %spark2 or %livy2 interpreters.

In the case of %spark2 interpreters it works ONLY if I disable impersonation and explicitly configure the Zeppelin's kerberos principal in the interpreter via the "spark.yarn.principal/keytab" properties. But in this case all the jobs are run as "zeppelin" user, but this is not an option for me in a multiuser environment.

If I don't use HWC Spark is working perfectly with user impersonation in both interpreters.

But with the %livy interpreter or when I enable impersonation in the %spark2 interpreter I get an error related to the spark not being able to create an SQL connection pool because of Kerberos GSS initialization failed:

java.lang.RuntimeException: java.sql.SQLException: Cannot create 
PoolableConnectionFactory (Could not open client transport for any of 
the Server URI's in ZooKeeper: GSS initiate failed)...Caused by: java.sql.SQLException: Could not open client transport for any of the Server URI's in ZooKeeper: GSS initiate failed
  at shadehive.org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:333) <br>... 59 more
Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed
  at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)

I have tried all the workarounds, checked the proxyusers configuration for both the zeppelin and livy principal in Hadoop/HDFS core-site, and also defined both users as Livy superusers but the problem persist.

I suspect maybe it's not posible to use Spark/Hive integration with user impersonation in the case of the %spark2 interpreter. But this has to be possible at least with the %livy interpeter.

Does anyone has an idea of or some suggestion on which extra configurations need to be done in order to make HWC work inside Zeppelin??

Thanks in advance.

2 REPLIES 2

Re: Zeppelin - HDP 3.1 - Spark integration with Hive using HWC not working in Kerberized cluster

Rising Star

Additional comments:

- Noticeably, this issue means a fundamental gap when it comes to leveraging Ranger authorization from Zeppelin.

- Impersonation is a must as individual developers and end users do require to use Zeppelin.




Re: Zeppelin - HDP 3.1 - Spark integration with Hive using HWC not working in Kerberized cluster

Expert Contributor

We have tested disabling user impersontation in Livy interpreter and HWC also works in this case. So the problem seems to be we are not able to use HWC with Spark from Zeppeling and using impersonation at the same time.

This seem a HUGE drawback for using Zeppelin with HDP 3.x in a secured multi user environment with policy based access restrictions.

Don't have an account?
Coming from Hortonworks? Activate your account here