Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to run oozie shell action for hive queries in kerberos-enabled cluster?

avatar
New Contributor

Hi, we're kerberizing our HDP cluster. As part of that process, we kerberized our QA cluster and testing all our oozie workflows in kerberos environment. We were able to run java and hive actions successfully, but are stuck with shell actions where we run a hive query inside a shell action. We've tried multiple approaches. But, none of them works.

Here is what we tried:

  • Approach # 1:

Added "credentials" section in our workflow similar to what we do for hive actions.

  • Approach # 2:

Doing kinit inside the shell script before launching hive CLI.

We get this error for both the approaches:

Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6744)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:628)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:987)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)

at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6744)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:628)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:987)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145)

at org.apache.hadoop.ipc.Client.call(Client.java:1427)
at org.apache.hadoop.ipc.Client.call(Client.java:1358)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy14.getDelegationToken(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:933)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104)
at com.sun.proxy.$Proxy15.getDelegationToken(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1043)
at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1552)
at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:530)
at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:508)
at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2238)
at org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:107)
at org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:86)
at org.apache.tez.common.security.TokenCache.obtainTokensForFileSystems(TokenCache.java:76)
at org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:200)
at org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:845)
at org.apache.tez.client.TezClient.start(TezClient.java:380)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196)
at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504)
... 8 more

Followed this blog using-oozie-in-kerberized-cluster. Please refer to bullet point #7 in this blog.

Please advise. Thanks much.

1 ACCEPTED SOLUTION
6 REPLIES 6

avatar
New Contributor

@Sunile Manjee thanks much for your response.

We did try "kinit" inside the shell script. Still got the same error. But, the only difference here is that we do not have keytab on HDFS. We pushed the keytab to all the cluster nodes and it is available on local file system.

kinit foo@TEST.COM -k -t /etc/security/keytabs/foo.headless.keytab

avatar
Master Guru

@Gopichand Mummineni this is how I would do it. I got much of this from @Benjamin Leonhardi feedback

Shell Action

  • This options requires client to be installed on all nodes
  • Store Keytabs on HDFS
    • Secure via Ranger/ACL/Chmod
  • Use file tab to identify hdfs keytab location
    • When oozie shell action runs it will download to local yarn directory
    • K-init inside shell script

avatar
Master Guru

@Gopichand Mummineni my understand it has to be run as oozie and not foo@*. @Benjamin Leonhardi please confirm or correct this understanding.

avatar
Master Guru

@Gopichand Mummineni - I got it working. I will write a blog and update you shortly

avatar
Master Guru

avatar
New Contributor

Thanks for the response @Kuldeep Kulkarni.

I followed the exact same approach you explained in your blog. The only difference was that we had tez as our execution engine for hive.

Anyhow, I tried this first and it didn't work. It is because I believe that I am changing the execution engine to mr with in the hive CLI which is late because hive attempts to launch Tez AppMaster when we attempt to launch the CLI itself even before running the query.

hive -e "SET hive.execution.engine=mr; SET mapreduce.job.credentials.binary=${HADOOP_TOKEN_FILE_LOCATION}; select MAX(update_time) from test_db.test_table;" -S

Then, I changed my command to include hiveconf so that I am changing my execution engine to mr even before Hive attempts to launch the Tez AppMaster. This one worked!!!

hive -e "SET mapreduce.job.credentials.binary=${HADOOP_TOKEN_FILE_LOCATION}; select MAX(update_time) from test_db.test_table;" -S --hiveconf hive.execution.engine=mr

I also tried other options as well which are listed below. These didn't work either.

hive -e "select MAX(update_time) from test_db.test_table;" -S --hiveconf tez.credentials.path=${HADOOP_TOKEN_FILE_LOCATION} 
hive -e "SET tez.credentials.path=${HADOOP_TOKEN_FILE_LOCATION}; select MAX(update_time) from test_db.test_table;" -S
I am curious to understand why passing the credentials to Tez won't work. Are you aware of any existing open apache bug for this? Understanding this better will help me in future.

Thanks again.