Support Questions

Find answers, ask questions, and share your expertise

Access to Hive from Oozie java action with Kerberos

avatar
Rising Star

Hi,

I am trying to execute Hive query from java action that is part of Oozie workflow. My preferred way is to use the Beeline or JBDC rather than old Hive CLI. However I am struggling with this a little bit, since the connection is failing due to authentication errors. When the java action code is executed on some data node in a cluster, Oozie does not do kinit for the user and thus the connection fails. Both Beeline and JDBC connection string seem to support delegation tokens, but those token can only be obtained when user is logged in by Kerberos (kinited).

We are currently using hive-0.14.0 and oozie-4.1.

I have found out that new hive2 action introduced in oozie-4.2 seems to first create jdbc connection under Oozie Kerberos login, obtain delegation token from this connection and finally pass this token to the Beeline. Maybe the same approach could be used here as well. It would require a new custom oozie action (e.g. java-jdbc). . Seems possible but it is quite complicated; is there some easier way?

Thanks for any comments,

Pavel

1 ACCEPTED SOLUTION

avatar
Master Guru

You could use a shell action, add the token to the oozie files ( file tag ) and do the kinit yourself before running the java command. Obviously not that elegant and you have a token somewhere in HDFS but it should work. I did something similar with a shell action running a scala program and running a kinit before. ( Not against hive but running kinit then connecting to HDFS ). Ceterum censeo I would always suggest using a hive server with LDAP/PAM authentication. beeline and hive2 action has a password file option now and it makes life so much easier. As a database guy kerberos for a jdbc connection just always makes problems.

Here is the oozie shell command by the way.

<shell xmlns="uri:oozie:shell-action:0.1">
   <job-tracker>${jobTracker}</job-tracker>
   <name-node>${nameNode}</name-node>
   <exec>runJavaCommand.sh</exec>
   <file>${nameNode}/scripts/runJavaCommand.sh#runJavaCommand.sh</file>
   <file>${nameNode}/securelocation/user.keytab#user.keytab</file>
</shell>

then just add a kinit into the script before running java
kinit -kt user.keytab user@EXAMPLE.COM
java org.apache.myprogram


View solution in original post

19 REPLIES 19

avatar
Rising Star

@Guillermo Ortiz I would say it is Oozie/Kerberos problem. If I would like to call HBase from Oozie (there is probably not a task for it), I would end up with the same problem.

avatar
Master Guru

@Pavel Benes

I know it has been forever. But its possible to provide a password file to beeline. So they might not have to log on. Similar to a keytab the file would be stored in their account.

avatar
Rising Star

The problem is that our system does not have access to users password or keytab. It uses kerberos authentication and than Haddop proxy user to access various Hadoop services. So it is not possible for us to do kinit again on a data node or use password (in file or directly).

avatar
Contributor

I am also trying to execute hive query through oozie java action on kerberized environment (https://community.hortonworks.com/questions/23857/executing-hive-queries-through-oozie-java-action-o.html). I tried above solution, but still I am facing issue.

avatar

@Pavel Benes - Can you modify the source of the program that gets executed Java Action? If so, you can include the kinit as part of the java code.

See below link -

https://community.hortonworks.com/questions/1807/c...

avatar
Rising Star

@bsaini I can modify the action code, but I cannot do the kinit there since I do not have access to user's keytab at all. My scenario is like this:

  • the user is logged in company network (with Kerberos)
  • the user access the REST API of some application server (authenticated using Kerberos)
  • the application server runs Oozie workflow, that includes the java task that needs to access some tables in Hive using the original user credentials.

The only way I see is the delegation token. Even if Oozie would support kinit on data, it still is no help, since the keytab/password is not available.

avatar
New Contributor

Did you get a solution?

avatar
Contributor

@Pavel Benes Did you get the solution. I am facing the similar issue. My java application writes the result to HDFS and it needs kerberos authentication. When schedule my application using oozie. I am facing this issue.

avatar
Rising Star

@Padmanabhan Vijendran

Actually I did not, since the need passed. However my question was more about access to Hive. In case of HDFS it should be more simple.

In your java code you need to have something like this:

if (System.getenv("HADOOP_TOKEN_FILE_LOCATION") != null) {
            jobConf.set("mapreduce.job.credentials.binary", System.getenv("HADOOP_TOKEN_FILE_LOCATION"));
}

to tell your java app where to find delegation token needed for HDFS access.

Hope this helps,

Pavel

avatar
Contributor
@Pavel Benes

Did you fixed the hive issues in Java Action from oozie ?