We have a Kerberised cluster and I'm trying to run a Java action in Oozie where I make a JDBC connection to Hive. This JDBC connections works fine on the Sandbox without Kerberos.
The connection string is as simple as the following, where I'm providing username and password in it:
Connection con = DriverManager.getConnection("jdbc:hive2://W12345:10000/control;principal=hive/W12345.companynet.net@COMPANYNET.NET","user123","passw123");
The Oozie action completes succesfully, and the Java log does not present any error:
1742 [main] INFO org.apache.hive.jdbc.Utils - Supplied authorities: W12345:10000 1742 [main] INFO org.apache.hive.jdbc.Utils - Resolved authority: W12345:10000 1766 [main] INFO org.apache.hive.jdbc.HiveConnection - Will try to open client transport with JDBC Uri: jdbc:hive2://W12345:10000/control;principal=hive/W12345.companynet.net@COMPANYNET.NET <<< Invocation of Main class completed <<< Oozie Launcher ends 1785 [main] INFO org.apache.hadoop.mapred.Task - Task:attempt_1464245290012_0129_m_000000_0 is done. And is in the process of committing 1847 [main] INFO org.apache.hadoop.mapred.Task - Task attempt_1464245290012_0129_m_000000_0 is allowed to commit now 1854 [main] INFO org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Saved output of task 'attempt_1464245290012_0129_m_000000_0' to hdfs://danskehadoop/user/user123/oozie-oozi/0000013-160527101253015-oozie-oozi-W/JavaAction--java/output/_temporary/1/task_1464245290012_0129_m_000000 1909 [main] INFO org.apache.hadoop.mapred.Task - Task 'attempt_1464245290012_0129_m_000000_0' done.
But the Java main does not actually complete the execution as the JDBC connection fails with an exception that I can see only in the Hive log:
ERROR [HiveServer2-Handler-Pool: Thread-78363]: server.TThreadPoolServer (TThreadPoolServer.java:run(296)) - Error occurred during processing of message. java.lang.RuntimeException: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:739) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory$1.run(HadoopThriftAuthBridge.java:736) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:360) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1637) at org.apache.hadoop.hive.thrift.HadoopThriftAuthBridge$Server$TUGIAssumingTransportFactory.getTransport(HadoopThriftAuthBridge.java:736) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:268) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.thrift.transport.TSaslTransportException: No data or no sasl data in the stream at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:328) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ... 10 more
Thanks in advance for the help!
You can try to add the Hive2 credential to your java action but I am afraid that it is not supported for a generic Java action ( still worth a try),
if you need to get it yourself you would have to programatically read the keytab and get the ticket yourself:
And finally why don't you use a more normal authentication mechanism for hive like PAM/LDAP ( ceterum censeo cartaginem delendam esse )
Hi Benjamin, Thank you for your reply.
Option 1 - did not work, as expected 🙂
Option 2 - In the link, at step b2, there's this command:
Does this mean that my user (later on it will be a system user) needs to have a keytab created on the linux file system and distributed to all the nodes? Moreover, it might not be a great option, but isn't this authentication possible using only username/password ?
Option 3 - I'm using the current mechanism as it is the only one I found some examples on the net. I checked shortly on PAM/LDAP, I'm not sure yet if that will require some changes from the Hadoop cluster side. If not, I'll be happy to try it
"Does this mean that my user (later on it will be a system user) needs to have a keytab created on the linux file system and distributed to all the nodes?"
You would put the keytab in HDFS with access rights for only the user and use the oozie files tag to load it to your temp execution directory,
"Moreover, it might not be a great option, but isn't this authentication possible using only username/password ?"
To do this you need PAM or LDAP authentication, thats why I mentioned it :-). You can either hardcode it or do the same thing we discussed above with a password file in hdfs. For this you can set access rights.
"Option 3 - I'm using the current mechanism as it is the only one I found some examples on the net. I checked shortly on PAM/LDAP, I'm not sure yet if that will require some changes from the Hadoop cluster side. If not, I'll be happy to try it"
Hi Antonio Arfe
Can you connect to HiveServer2 via beeline with your connection string?
If you can, the error may suggest that an yarn app is not starting for some reason. You can verify that by visiting resource manager UI.
Hi Tahahiko, Yes, it works from beeline (without username/password), but I believe it is because I connect to beeline via CLI, which means I already have a Kerberos authentication for my username. I'm not sure what I should look for in Resource UI, it just shows the successful map/reduce job connected to my Java action, the log does not seem to contain much.