- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Access to Hive from Oozie java action with Kerberos
- Labels:
-
Apache Hive
-
Apache Oozie
Created 01-06-2016 09:12 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi,
I am trying to execute Hive query from java action that is part of Oozie workflow. My preferred way is to use the Beeline or JBDC rather than old Hive CLI. However I am struggling with this a little bit, since the connection is failing due to authentication errors. When the java action code is executed on some data node in a cluster, Oozie does not do kinit for the user and thus the connection fails. Both Beeline and JDBC connection string seem to support delegation tokens, but those token can only be obtained when user is logged in by Kerberos (kinited).
We are currently using hive-0.14.0 and oozie-4.1.
I have found out that new hive2 action introduced in oozie-4.2 seems to first create jdbc connection under Oozie Kerberos login, obtain delegation token from this connection and finally pass this token to the Beeline. Maybe the same approach could be used here as well. It would require a new custom oozie action (e.g. java-jdbc). . Seems possible but it is quite complicated; is there some easier way?
Thanks for any comments,
Pavel
Created 01-07-2016 02:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could use a shell action, add the token to the oozie files ( file tag ) and do the kinit yourself before running the java command. Obviously not that elegant and you have a token somewhere in HDFS but it should work. I did something similar with a shell action running a scala program and running a kinit before. ( Not against hive but running kinit then connecting to HDFS ). Ceterum censeo I would always suggest using a hive server with LDAP/PAM authentication. beeline and hive2 action has a password file option now and it makes life so much easier. As a database guy kerberos for a jdbc connection just always makes problems.
Here is the oozie shell command by the way.
<shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>runJavaCommand.sh</exec> <file>${nameNode}/scripts/runJavaCommand.sh#runJavaCommand.sh</file> <file>${nameNode}/securelocation/user.keytab#user.keytab</file> </shell> then just add a kinit into the script before running java kinit -kt user.keytab user@EXAMPLE.COM java org.apache.myprogram
Created 01-07-2016 03:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Pavel Benes more of a hack but maybe you can try using a shell action to call your java command, then capture output of the action?
Created 01-07-2016 08:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Artem Ervits Thanks for reply. In my opinion it would not help. The shell action is the same as java one with respect to kerberos login, so the delegation token is still required to connect from JDBC. The only way I see is to do the initial JDBC connection within Oozie action handler/executor that is executed under kinit and pass the delegation token to the actual java action code running on a datanode. But maybe I miss something.
Th
Created 01-07-2016 12:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This calls for an expert, @bsaini
Created 01-07-2016 02:05 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
You could use a shell action, add the token to the oozie files ( file tag ) and do the kinit yourself before running the java command. Obviously not that elegant and you have a token somewhere in HDFS but it should work. I did something similar with a shell action running a scala program and running a kinit before. ( Not against hive but running kinit then connecting to HDFS ). Ceterum censeo I would always suggest using a hive server with LDAP/PAM authentication. beeline and hive2 action has a password file option now and it makes life so much easier. As a database guy kerberos for a jdbc connection just always makes problems.
Here is the oozie shell command by the way.
<shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <exec>runJavaCommand.sh</exec> <file>${nameNode}/scripts/runJavaCommand.sh#runJavaCommand.sh</file> <file>${nameNode}/securelocation/user.keytab#user.keytab</file> </shell> then just add a kinit into the script before running java kinit -kt user.keytab user@EXAMPLE.COM java org.apache.myprogram
Created 01-07-2016 08:00 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Benjamin Leonhardi Thanks for reply. However as I tried to describe above, I cannot do kinit for the user since I do not have access the his keytab at all. I am not sure, maybe in theory I could do kinit as some service user with ability to impersonate users on Haddop (e,g. like oozie), and using doAs() to get access to hive or obtain delegation token. I am not a Kerberos expert, but it still feels like a security hole to allow access to this keytab for normal users.
Created 01-07-2016 09:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Benjamin Leonhardi Regarding the LDAP/PAM scenario you mention, I am not familiar with details, but I am afraid that our users are expecting a single sing on, so they wont be willing to enter their intranet password again to some "custom" system.
Created 02-16-2016 04:42 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Did you get a solution to your problem? What I don't understand it's.. is it necessary to do the kinit inside of the java code with UserGroupInformation and so on if you want to connect to Hive with Kerberos?
Created 02-16-2016 08:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Guillermo Ortiz Not really, I have split the original java action into two Oozie actions; the first one is hive action where I get what I need from from hive (using temporary external table) and the second java actions where the data are further processed. Currently I use hive action, but it should be trivial to replace it with hive2 action in future when needed.
And yes, according to my knowledge it necessary to have valid kerberos token (kinit does not have to happen in java though) or use delegation token to connect to Kerberized hive from java.
Created 02-17-2016 03:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
is it just a problem with JDBC/Hive/Java Action and Oozie? If you want to make some query to HBase or whatever with Kerberos. Would you have the same problem?