Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Access to Hive from Oozie java action with Kerberos

avatar
Rising Star

Hi,

I am trying to execute Hive query from java action that is part of Oozie workflow. My preferred way is to use the Beeline or JBDC rather than old Hive CLI. However I am struggling with this a little bit, since the connection is failing due to authentication errors. When the java action code is executed on some data node in a cluster, Oozie does not do kinit for the user and thus the connection fails. Both Beeline and JDBC connection string seem to support delegation tokens, but those token can only be obtained when user is logged in by Kerberos (kinited).

We are currently using hive-0.14.0 and oozie-4.1.

I have found out that new hive2 action introduced in oozie-4.2 seems to first create jdbc connection under Oozie Kerberos login, obtain delegation token from this connection and finally pass this token to the Beeline. Maybe the same approach could be used here as well. It would require a new custom oozie action (e.g. java-jdbc). . Seems possible but it is quite complicated; is there some easier way?

Thanks for any comments,

Pavel

1 ACCEPTED SOLUTION

avatar
Master Guru

You could use a shell action, add the token to the oozie files ( file tag ) and do the kinit yourself before running the java command. Obviously not that elegant and you have a token somewhere in HDFS but it should work. I did something similar with a shell action running a scala program and running a kinit before. ( Not against hive but running kinit then connecting to HDFS ). Ceterum censeo I would always suggest using a hive server with LDAP/PAM authentication. beeline and hive2 action has a password file option now and it makes life so much easier. As a database guy kerberos for a jdbc connection just always makes problems.

Here is the oozie shell command by the way.

<shell xmlns="uri:oozie:shell-action:0.1">
   <job-tracker>${jobTracker}</job-tracker>
   <name-node>${nameNode}</name-node>
   <exec>runJavaCommand.sh</exec>
   <file>${nameNode}/scripts/runJavaCommand.sh#runJavaCommand.sh</file>
   <file>${nameNode}/securelocation/user.keytab#user.keytab</file>
</shell>

then just add a kinit into the script before running java
kinit -kt user.keytab user@EXAMPLE.COM
java org.apache.myprogram


View solution in original post

19 REPLIES 19

avatar
Master Mentor

@Pavel Benes more of a hack but maybe you can try using a shell action to call your java command, then capture output of the action?

avatar
Rising Star

@Artem Ervits Thanks for reply. In my opinion it would not help. The shell action is the same as java one with respect to kerberos login, so the delegation token is still required to connect from JDBC. The only way I see is to do the initial JDBC connection within Oozie action handler/executor that is executed under kinit and pass the delegation token to the actual java action code running on a datanode. But maybe I miss something.

Th

avatar
Master Mentor

This calls for an expert, @bsaini

avatar
Master Guru

You could use a shell action, add the token to the oozie files ( file tag ) and do the kinit yourself before running the java command. Obviously not that elegant and you have a token somewhere in HDFS but it should work. I did something similar with a shell action running a scala program and running a kinit before. ( Not against hive but running kinit then connecting to HDFS ). Ceterum censeo I would always suggest using a hive server with LDAP/PAM authentication. beeline and hive2 action has a password file option now and it makes life so much easier. As a database guy kerberos for a jdbc connection just always makes problems.

Here is the oozie shell command by the way.

<shell xmlns="uri:oozie:shell-action:0.1">
   <job-tracker>${jobTracker}</job-tracker>
   <name-node>${nameNode}</name-node>
   <exec>runJavaCommand.sh</exec>
   <file>${nameNode}/scripts/runJavaCommand.sh#runJavaCommand.sh</file>
   <file>${nameNode}/securelocation/user.keytab#user.keytab</file>
</shell>

then just add a kinit into the script before running java
kinit -kt user.keytab user@EXAMPLE.COM
java org.apache.myprogram


avatar
Rising Star

@Benjamin Leonhardi Thanks for reply. However as I tried to describe above, I cannot do kinit for the user since I do not have access the his keytab at all. I am not sure, maybe in theory I could do kinit as some service user with ability to impersonate users on Haddop (e,g. like oozie), and using doAs() to get access to hive or obtain delegation token. I am not a Kerberos expert, but it still feels like a security hole to allow access to this keytab for normal users.

avatar
Rising Star

@Benjamin Leonhardi Regarding the LDAP/PAM scenario you mention, I am not familiar with details, but I am afraid that our users are expecting a single sing on, so they wont be willing to enter their intranet password again to some "custom" system.

avatar
New Contributor

Did you get a solution to your problem? What I don't understand it's.. is it necessary to do the kinit inside of the java code with UserGroupInformation and so on if you want to connect to Hive with Kerberos?

avatar
Rising Star

@Guillermo Ortiz Not really, I have split the original java action into two Oozie actions; the first one is hive action where I get what I need from from hive (using temporary external table) and the second java actions where the data are further processed. Currently I use hive action, but it should be trivial to replace it with hive2 action in future when needed.

And yes, according to my knowledge it necessary to have valid kerberos token (kinit does not have to happen in java though) or use delegation token to connect to Kerberized hive from java.

avatar
New Contributor

is it just a problem with JDBC/Hive/Java Action and Oozie? If you want to make some query to HBase or whatever with Kerberos. Would you have the same problem?