Hi, I'd like to ask about CDH support for long running applications on YARN. We are trying to setup gobblin to work with CDH 5.11.2, but regularly after 2 days we get message that AMRMToken is invalid:
CEST INFO [AMRM Heartbeater thread] org.apache.hadoop.io.retry.RetryInvocationHandler - Exception while invoking allocate of class ApplicationMasterProtocolPBClientImpl over rm365. Trying to fail over immediately.
org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1535039367371_143153_000001
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)
at com.sun.proxy.$Proxy20.allocate(Unknown Source)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1535039367371_143153_000001
at com.sun.proxy.$Proxy19.allocate(Unknown Source)
Do you know why would AM fail to refresh the token before its 48h expiration period passes? I found this ticket that, I think, would provide support for AM token refreshment:
It hasn't been solved yet, so I wonder if this functionality has been implemented in another ticket, or is not supported by Yarn yet?
I've found also a bug that perhaps relates to the same issue:
This ticket is solved, but not included in any CDH5 version yet. Do you think we are hitting this issue? If so - when would it be incorporated into CDH?
There can be many reason for this so you can check following to begin with and let us know if these settings are fine:
1. What is the value of fs.namenode.delegation.token.max-lifetime set on your cluster to see after which the tokens might not be renewed. If that is set less for 2 days then that can explain the behavior.
2. You also need to check the logs to see if there was some exception while trying to renew the tokes and if yes, resolve that
3. The renewal is also dependent upon the AM implementation whihc in this case is gobblin, so you need to check how the keytabs are being passed to AM in case of gobblin and if the configurations at gobblin end is set correctly like gobblin.yarn.login.interval.minutes and gobblin.yarn.token.renew.interval.minutes
We do not support gobblin, but for simillar yarn application like spark, we pass the keytab while submitting application whihc is used to renew the token. So this needs to be looked from gobblin side too.
Thanks & regards
thank you very much for your answer. At this moment I can only confirm that fs.namenode.delegation.token.max-lifetime is set to 7 days. We use gobblin keytab and have experimented with different settings of gobblin.yarn.login.interval.minutes and gobblin.yarn.token.renew.interval.minutes on gobblin side, but with no success yet. I've started a new run of gobblin now, so we'll need to wait some time for the next failure. I'll check logs against possible token renewal errors or any other suspicious symptomps and get back in this thread with results.