- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to run oozie shell action for hive queries in kerberos-enabled cluster?
- Labels:
-
Apache Hive
-
Apache Oozie
Created ‎07-19-2016 06:59 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi, we're kerberizing our HDP cluster. As part of that process, we kerberized our QA cluster and testing all our oozie workflows in kerberos environment. We were able to run java and hive actions successfully, but are stuck with shell actions where we run a hive query inside a shell action. We've tried multiple approaches. But, none of them works.
Here is what we tried:
- Approach # 1:
Added "credentials" section in our workflow similar to what we do for hive actions.
- Approach # 2:
Doing kinit inside the shell script before launching hive CLI.
We get this error for both the approaches:
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6744) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:628) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:987) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:507) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:680) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:624) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): Delegation Token can be issued only with kerberos or web authentication at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getDelegationToken(FSNamesystem.java:6744) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getDelegationToken(NameNodeRpcServer.java:628) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getDelegationToken(ClientNamenodeProtocolServerSideTranslatorPB.java:987) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2151) at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2147) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2145) at org.apache.hadoop.ipc.Client.call(Client.java:1427) at org.apache.hadoop.ipc.Client.call(Client.java:1358) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.$Proxy14.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getDelegationToken(ClientNamenodeProtocolTranslatorPB.java:933) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:252) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:104) at com.sun.proxy.$Proxy15.getDelegationToken(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.getDelegationToken(DFSClient.java:1043) at org.apache.hadoop.hdfs.DistributedFileSystem.getDelegationToken(DistributedFileSystem.java:1552) at org.apache.hadoop.fs.FileSystem.collectDelegationTokens(FileSystem.java:530) at org.apache.hadoop.fs.FileSystem.addDelegationTokens(FileSystem.java:508) at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2238) at org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:107) at org.apache.tez.common.security.TokenCache.obtainTokensForFileSystemsInternal(TokenCache.java:86) at org.apache.tez.common.security.TokenCache.obtainTokensForFileSystems(TokenCache.java:76) at org.apache.tez.client.TezClientUtils.setupTezJarsLocalResources(TezClientUtils.java:200) at org.apache.tez.client.TezClient.getTezJarResources(TezClient.java:845) at org.apache.tez.client.TezClient.start(TezClient.java:380) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:196) at org.apache.hadoop.hive.ql.exec.tez.TezSessionState.open(TezSessionState.java:117) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:504) ... 8 more
Followed this blog using-oozie-in-kerberized-cluster. Please refer to bullet point #7 in this blog.
Please advise. Thanks much.
Created ‎07-29-2016 05:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎07-19-2016 07:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Sunile Manjee thanks much for your response.
We did try "kinit" inside the shell script. Still got the same error. But, the only difference here is that we do not have keytab on HDFS. We pushed the keytab to all the cluster nodes and it is available on local file system.
kinit foo@TEST.COM -k -t /etc/security/keytabs/foo.headless.keytab
Created ‎07-19-2016 07:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Gopichand Mummineni this is how I would do it. I got much of this from @Benjamin Leonhardi feedback
Shell Action
- This options requires client to be installed on all nodes
- Store Keytabs on HDFS
- Secure via Ranger/ACL/Chmod
- Use file tab to identify hdfs keytab location
- When oozie shell action runs it will download to local yarn directory
- K-init inside shell script
Created ‎07-19-2016 07:18 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Gopichand Mummineni my understand it has to be run as oozie and not foo@*. @Benjamin Leonhardi please confirm or correct this understanding.
Created ‎07-27-2016 12:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Gopichand Mummineni - I got it working. I will write a blog and update you shortly
Created ‎07-29-2016 05:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I got this working
Please refer https://community.hortonworks.com/content/kbentry/48132/oozie-shell-action-run-hive-query-in-shell-s...
Created ‎07-29-2016 04:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response @Kuldeep Kulkarni.
I followed the exact same approach you explained in your blog. The only difference was that we had tez as our execution engine for hive.
Anyhow, I tried this first and it didn't work. It is because I believe that I am changing the execution engine to mr with in the hive CLI which is late because hive attempts to launch Tez AppMaster when we attempt to launch the CLI itself even before running the query.
hive -e "SET hive.execution.engine=mr; SET mapreduce.job.credentials.binary=${HADOOP_TOKEN_FILE_LOCATION}; select MAX(update_time) from test_db.test_table;" -S
Then, I changed my command to include hiveconf so that I am changing my execution engine to mr even before Hive attempts to launch the Tez AppMaster. This one worked!!!
hive -e "SET mapreduce.job.credentials.binary=${HADOOP_TOKEN_FILE_LOCATION}; select MAX(update_time) from test_db.test_table;" -S --hiveconf hive.execution.engine=mr
I also tried other options as well which are listed below. These didn't work either.
hive -e "select MAX(update_time) from test_db.test_table;" -S --hiveconf tez.credentials.path=${HADOOP_TOKEN_FILE_LOCATION}
hive -e "SET tez.credentials.path=${HADOOP_TOKEN_FILE_LOCATION}; select MAX(update_time) from test_db.test_table;" -SI am curious to understand why passing the credentials to Tez won't work. Are you aware of any existing open apache bug for this? Understanding this better will help me in future.
Thanks again.
