Created 07-21-2016 03:53 PM
Hi,
I am running a spark application in a Kerberos based HDP platform. This spark application connects to HBase, write and read data perfectly well in a local mode on any node in the cluster. However, when I run this application on the cluster by using "-master yarn and --deploymode client (or cluster)" the Kerberos authentication fails. I have tried all sorts of things by doing Kinit outside of the application on each node, and doing Kerberos authentication inside the application as well but none of it has worked so far. In the local mode, nothing seems to have any issue and everything works: when I do kinit outside and do not perform any authentication inside the application. However, in the cluster mode nothing works whether I authenticate inside the application of outside the application. Here is an extract of the stack trace:
ERROR ipc.AbstractRpcClient: SASL authentication failed. The most likely cause is missing or invalid credentials. Consider 'kinit'.javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
Below is the code that I used for authenticating inside the application:
Configuration conf = HBaseConfiguration.create(); conf.addResource(new Path(hbaseConfDir,"hbase-site.xml")); conf.addResource(new Path(hadoopConfDir,"core-site.xml")); conf.set("hbase.client.keyvalue.maxsize", "0");conf.set("hbase.rpc.controllerfactory.class","org.apache.hadoop.hbase.ipc.RpcControllerFactory"); <b> conf.set("hadoop.security.authentication", "kerberos"); conf.set("hbase.security.authentication", "kerberos"); UserGroupInformation.setConfiguration(conf); String keyTab="/etc/security/keytabs/somekeytab"; UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI("name@xyz.com", keyTab); UserGroupInformation.setLoginUser(ugi); </b> connection=ConnectionFactory.createConnection(conf); logger.debug("HBase connected");
Adding or removing the bold lines in the above code didn't really have any effect other than the fact that when the bold lines are there kinit outside of the application is not needed.
Please let me know how can I solve this problem. It has been quite some time I am hitting my head on this issue.
Created 07-21-2016 04:02 PM
You should not rely on an external ticket cache for distributed jobs. The best solution is to ship a keytab with your application or rely on a keytab being deployed on all nodes where your Spark task may be executed.
You likely want to replace:
UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI("name@xyz.com", keyTab); UserGroupInformation.setLoginUser(ugi);
With:
UserGroupInformation.loginUserFromKeytab("name@xyz.com", keyTab); connection=ConnectionFactory.createConnection(conf);
With your approach above, you would need to do something like the following after obtaining the UserGroupInformation instance:
ugi.doAs(new PrivilegedAction<Void>() { public Void run() { connection = ConnectionFactory.createConnection(conf); ... return null; } });
Created 07-21-2016 04:02 PM
You should not rely on an external ticket cache for distributed jobs. The best solution is to ship a keytab with your application or rely on a keytab being deployed on all nodes where your Spark task may be executed.
You likely want to replace:
UserGroupInformation ugi = UserGroupInformation.loginUserFromKeytabAndReturnUGI("name@xyz.com", keyTab); UserGroupInformation.setLoginUser(ugi);
With:
UserGroupInformation.loginUserFromKeytab("name@xyz.com", keyTab); connection=ConnectionFactory.createConnection(conf);
With your approach above, you would need to do something like the following after obtaining the UserGroupInformation instance:
ugi.doAs(new PrivilegedAction<Void>() { public Void run() { connection = ConnectionFactory.createConnection(conf); ... return null; } });
Created 09-14-2016 12:35 PM
Above solution is not working for me. However I found below error in debug log. I have hbase libs present in my --driver class path and --jars. 16/09/14 16:56:26 INFO YarnSparkHadoopUtil: HBase class not found java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration 16/09/14 16:56:26 DEBUG YarnSparkHadoopUtil: HBase class not found java.lang.ClassNotFoundException: org.apache.hadoop.hbase.HBaseConfiguration at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.obtainTokenForHBaseInner(YarnSparkHadoopUtil.scala:381) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.obtainTokenForHBase(YarnSparkHadoopUtil.scala:362) at org.apache.spark.deploy.yarn.YarnSparkHadoopUtil.obtainTokenForHBase(YarnSparkHadoopUtil.scala:165) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:349) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:733) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:143) at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend.start(YarnClientSchedulerBackend.scala:56) at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:144) at org.apache.spark.SparkContext.<init>(SparkContext.scala:530) at com.xyz.demo.dq.util.ContextBuilder$.getSparkContext(DQUtils.scala:118) at com.xyz.demo.dq.DataQualityApplicationHandler$delayedInit$body.apply(DataQualityApplicationHandler.scala:62) at scala.Function0$class.apply$mcV$sp(Function0.scala:40) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12) at scala.App$anonfun$main$1.apply(App.scala:71) at scala.App$anonfun$main$1.apply(App.scala:71) at scala.collection.immutable.List.foreach(List.scala:318) at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:32) at scala.App$class.main(App.scala:71) at com.xyz.demo.dq.DataQualityApplicationHandler$.main(DataQualityApplicationHandler.scala:52) at com.xyz.demo.dq.DataQualityApplicationHandler.main(DataQualityApplicationHandler.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 16/09/14 16:56:26 WARN Token: Cannot find class for token kind HIVE_DELEGATION_TOKEN 16/09/14 16:56:26 WARN Token: Cannot find class for token kind HIVE_DELEGATION_TOKEN 16/09/14 16:56:26 DEBUG Client: Kind: HDFS_DELEGATION_TOKEN, Service: 10.60.70.10:8020, Ident: (HDFS_DELEGATION_TOKEN token 9045 for ctadmin); HDFS_DELEGATION_TOKEN token 9045 for ctadmin; Renewer: yarn; Issued: 9/14/16 4:56 PM; Max Date: 9/21/16 4:56 PM Kind: HIVE_DELEGATION_TOKEN, Service: , Ident: 00 12 63 74 61 64 6d 69 6e 40 48 53 43 41 4c 45 2e 43 4f 4d 04 68 69 76 65 00 8a 01 57 28 72 aa ff 8a 01 57 4c 7f 2e ff 2a 40; null
Created 09-14-2016 03:49 PM
You have a completely different error, @Ashish Gupta
Please create your own question for this issue. It is related to your classpath.
Created 09-15-2016 10:52 AM
@Josh Elser You are correct. My issue was different, it was related to classpath. I resolved that and now while connecting to secure cluster with above solution I am getting below error. Could you please help me out.
Caused by: org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: demo-dev1-nn/10.60.70.10:16000 at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1540) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionManager.java:1560) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.getKeepAliveMasterService(ConnectionManager.java:1711) at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:38) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithRetries(RpcRetryingCaller.java:124) ... 54 more Caused by: com.google.protobuf.ServiceException: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: demo-dev1-nn/10.60.70.10:16000 at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:223) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:287) at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$BlockingStub.isMasterRunning(MasterProtos.java:58152) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$MasterServiceStubMaker.isMasterRunning(ConnectionManager.java:1571) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStubNoRetries(ConnectionManager.java:1509) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation$StubMaker.makeStub(ConnectionManager.java:1531) ... 58 more Caused by: org.apache.hadoop.hbase.ipc.FailedServerException: This server is in the failed servers list: demo-dev1-nn/10.60.70.10:16000 at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:701) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:887) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:856) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1200) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:213) ... 63 more
Created 09-15-2016 02:47 PM
Again, your issue is unrelated to this one. Please stop piggy-backing on other issues and create one for yourself.
Created 09-07-2017 07:56 AM
Created 07-21-2016 06:44 PM
I tried to ship keytab file using "--files" option and then read that file using SparkFiles.get("xyz.keytab"). I have also tried the following statement but it didn't work.
UserGroupInformation.loginUserFromKeytab("name@xyz.com", keyTab);
However, your suggestion about adding ugi.doAs function helped me resolved this issue.
Here is the full code, if anyone else gets into the same trouble:
UserGroupInformation.setConfiguration(conf); String keyTab="/etc/security/keytabs/somekeytab" UserGroupInformation ugi=UserGroupInformation.loginUserFromKeytabAndReturnUGI("name@xyz.com", keyTab); UserGroupInformation.setLoginUser(ugi); ugi.doAs(new PrivilegedExceptionAction<Void>() { @Override public Void run() throws IOException { connection=ConnectionFactory.createConnection(conf); return null; } });
Created 07-21-2016 06:51 PM
Great. Glad you got it working in the end. I'm not sure how resource localization works in Spark (can only compare it to how I know YARN works).
The explanation behind those two different UserGroupInformation calls is that the one you invoked does not alter the static "current user" state inside UserGroupInformation and the JAAS login system. That is why you need the doAs() call. If you use loginUserFromKeytab() instead, you can remove the doAs and just interact with HBase normally.
Created 07-21-2016 07:25 PM
Sorry, I just corrected the code that worked for me, loginUserFromKeytab() didn't work but
loginUserFromKeytabAndReturnUGI with doAs() worked.