Created on 10-26-2017 02:16 PM - edited 09-16-2022 05:27 AM
Hi,
I have a cluster with CDH 5.8.4. I'm runnin a spark streaming application which reads and writes data from/to HBase by using the cloudera spark-hbase connector namely the HBaseContext.
When I start the application I give the principal and the kinit to the spark-submit script.
I'm seeing that after 7 days the application crashed with an error about the expiration of the ticket kerberos related to the HBase context. This is the error from the executors log:
ERROR executor.Executor: Exception in task 0.0 in stage 544265.0 (TID 1149098) org.apache.hadoop.hbase.client.RetriesExhaustedException: Can't get the location at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java :326) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:157) at org.apache.hadoop.hbase.client.ScannerCallableWithReplicas.call(ScannerCallableWithReplicas.java:61) at org.apache.hadoop.hbase.client.RpcRetryingCaller.callWithoutRetries(RpcRetryingCaller.java:200) at org.apache.hadoop.hbase.client.ClientScanner.call(ClientScanner.java:320) at org.apache.hadoop.hbase.client.ClientScanner.nextScanner(ClientScanner.java:295) at org.apache.hadoop.hbase.client.ClientScanner.initializeScannerInConstruction(ClientScanner.java:160) at org.apache.hadoop.hbase.client.ClientScanner.<init>(ClientScanner.java:155) at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:867) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.restart(TableRecordReaderImpl.java:91) at org.apache.hadoop.hbase.mapreduce.TableRecordReaderImpl.initialize(TableRecordReaderImpl.java:169) at org.apache.hadoop.hbase.mapreduce.TableRecordReader.initialize(TableRecordReader.java:134) at org.apache.hadoop.hbase.mapreduce.TableInputFormatBase$1.initialize(TableInputFormatBase.java:211) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.<init>(NewHadoopRDD.scala:164) at org.apache.spark.rdd.NewHadoopRDD.compute(NewHadoopRDD.scala:129) at org.apache.hadoop.hbase.spark.NewHBaseRDD.compute(NewHBaseRDD.scala:34) at org.apache.hadoop.hbase.spark.NewHBaseRDD.compute(NewHBaseRDD.scala:25) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.security.token.SecretManager$InvalidToken: Token has expired at sun.reflect.GeneratedConstructorAccessor58.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRemoteException(ProtobufUtil.java:327) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1593) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegionInMeta(ConnectionManager.java:1398) at org.apache.hadoop.hbase.client.ConnectionManager$HConnectionImplementation.locateRegion(ConnectionManager.java:1199) at org.apache.hadoop.hbase.client.RpcRetryingCallerWithReadReplicas.getRegionLocations(RpcRetryingCallerWithReadReplicas.java:315) ... 30 more Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Token has expired at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.readStatus(HBaseSaslRpcClient.java:155) at org.apache.hadoop.hbase.security.HBaseSaslRpcClient.saslConnect(HBaseSaslRpcClient.java:222) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupSaslConnection(RpcClientImpl.java:617) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.access$700(RpcClientImpl.java:162) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:743) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection$2.run(RpcClientImpl.java:740) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1783) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.setupIOstreams(RpcClientImpl.java:740) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.writeRequest(RpcClientImpl.java:906) at org.apache.hadoop.hbase.ipc.RpcClientImpl$Connection.tracedWriteRequest(RpcClientImpl.java:873) at org.apache.hadoop.hbase.ipc.RpcClientImpl.call(RpcClientImpl.java:1242) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.callBlockingMethod(AbstractRpcClient.java:227) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$BlockingRpcChannelImplementation.callBlockingMethod(AbstractRpcClient.java:336) at org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$BlockingStub.get(ClientProtos.java:34070) at org.apache.hadoop.hbase.protobuf.ProtobufUtil.getRowOrBefore(ProtobufUtil.java:1589)
Does anyone knows how to solve this issue?
Thanks in advance,
Beniamino
Created 10-31-2017 04:50 AM
Actually, what you are observing is documented here
The following limitations apply to Spark applications that access HBase in a Kerberized cluster:
Let us know if this answers your question.
Created 10-31-2017 07:39 AM
Thank you for getting back to me.
I'm wondering if it is possible to use Hbase with a spark streaming application in a Kerberized cluster without restarting the application every 7 days.
I would like to know if exists a best practice (or just some example) of how to istantiate an hbase connection for each executor.
I tried to istantiate a connection on the driver and sending it to executors using a broadcast variable but this solution didn't worked out because hbase connection object is not serializable.
Have you already faced and maybe solved this issue?
Thanks in advance,
Beniamino
Created 10-31-2017 10:06 PM
Hi Beniamino,
I am not very sure about your 2nd question but regarding the expired kerberos ticket, this has been fixed in CDH 5.12+ or (depending on whether you are using java/python) you can call UserGroupInformation.checkTGTAndReloginFromKeytab() periodically, before each put per rdd partition.
Created 07-05-2018 12:40 PM
I am still experiencing this. I am running Spark2, CDH 5.14.2. I noticed this inside the Yarn Service Log, not sure if it is related:
"No TokenRenewer defined for token kind HBASE_AUTH_TOKEN"
Is this the cause of why a Hbase Spark2 job would die after 7 days?