Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Zeppelin - HDP 3.1 - Spark integration with Hive using HWC not working in Kerberized cluster

Expert Contributor

Has anybody been able to make Spark-Hive integration using Hive Warehouse Connector work in Zeppelin with impersonation in a Kerberized cluster?

I have followed the steps in this article

and Spark works with the HWC using spark-shell or pyspark from the console, but Spark in Zeppelin will not work when using impersonation with the given instructions, neither with %spark2 or %livy2 interpreters.

In the case of %spark2 interpreters it works ONLY if I disable impersonation and explicitly configure the Zeppelin's kerberos principal in the interpreter via the "spark.yarn.principal/keytab" properties. But in this case all the jobs are run as "zeppelin" user, but this is not an option for me in a multiuser environment.

If I don't use HWC Spark is working perfectly with user impersonation in both interpreters.

But with the %livy interpreter or when I enable impersonation in the %spark2 interpreter I get an error related to the spark not being able to create an SQL connection pool because of Kerberos GSS initialization failed:

java.lang.RuntimeException: java.sql.SQLException: Cannot create 
PoolableConnectionFactory (Could not open client transport for any of 
the Server URI's in ZooKeeper: GSS initiate failed)...Caused by: java.sql.SQLException: Could not open client transport for any of the Server URI's in ZooKeeper: GSS initiate failed
  at shadehive.org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:333) <br>... 59 more
Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed
  at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)

I have tried all the workarounds, checked the proxyusers configuration for both the zeppelin and livy principal in Hadoop/HDFS core-site, and also defined both users as Livy superusers but the problem persist.

I suspect maybe it's not posible to use Spark/Hive integration with user impersonation in the case of the %spark2 interpreter. But this has to be possible at least with the %livy interpeter.

Does anyone has an idea of or some suggestion on which extra configurations need to be done in order to make HWC work inside Zeppelin??

Thanks in advance.

3 REPLIES 3

Rising Star

Additional comments:

- Noticeably, this issue means a fundamental gap when it comes to leveraging Ranger authorization from Zeppelin.

- Impersonation is a must as individual developers and end users do require to use Zeppelin.




Expert Contributor

We have tested disabling user impersontation in Livy interpreter and HWC also works in this case. So the problem seems to be we are not able to use HWC with Spark from Zeppeling and using impersonation at the same time.

This seem a HUGE drawback for using Zeppelin with HDP 3.x in a secured multi user environment with policy based access restrictions.

Explorer

You can try Livy to use impersonation at the same time.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.