Has anybody been able to make Spark-Hive integration using Hive Warehouse Connector work in Zeppelin with impersonation in a Kerberized cluster?
I have followed the steps in this article
and Spark works with the HWC using spark-shell or pyspark from the console, but Spark in Zeppelin will not work when using impersonation with the given instructions, neither with %spark2 or %livy2 interpreters.
In the case of %spark2 interpreters it works ONLY if I disable impersonation and explicitly configure the Zeppelin's kerberos principal in the interpreter via the "spark.yarn.principal/keytab" properties. But in this case all the jobs are run as "zeppelin" user, but this is not an option for me in a multiuser environment.
If I don't use HWC Spark is working perfectly with user impersonation in both interpreters.
But with the %livy interpreter or when I enable impersonation in the %spark2 interpreter I get an error related to the spark not being able to create an SQL connection pool because of Kerberos GSS initialization failed:
java.lang.RuntimeException: java.sql.SQLException: Cannot create PoolableConnectionFactory (Could not open client transport for any of the Server URI's in ZooKeeper: GSS initiate failed)...Caused by: java.sql.SQLException: Could not open client transport for any of the Server URI's in ZooKeeper: GSS initiate failed at shadehive.org.apache.hive.jdbc.HiveConnection.<init>(HiveConnection.java:333) <br>... 59 more Caused by: org.apache.thrift.transport.TTransportException: GSS initiate failed at org.apache.thrift.transport.TSaslTransport.sendAndThrowMessage(TSaslTransport.java:232)
I have tried all the workarounds, checked the proxyusers configuration for both the zeppelin and livy principal in Hadoop/HDFS core-site, and also defined both users as Livy superusers but the problem persist.
I suspect maybe it's not posible to use Spark/Hive integration with user impersonation in the case of the %spark2 interpreter. But this has to be possible at least with the %livy interpeter.
Does anyone has an idea of or some suggestion on which extra configurations need to be done in order to make HWC work inside Zeppelin??
Thanks in advance.
- Noticeably, this issue means a fundamental gap when it comes to leveraging Ranger authorization from Zeppelin.
- Impersonation is a must as individual developers and end users do require to use Zeppelin.
We have tested disabling user impersontation in Livy interpreter and HWC also works in this case. So the problem seems to be we are not able to use HWC with Spark from Zeppeling and using impersonation at the same time.
This seem a HUGE drawback for using Zeppelin with HDP 3.x in a secured multi user environment with policy based access restrictions.