Reply
Explorer
Posts: 16
Registered: ‎09-25-2014

Long running yarn app (storm-yarn) exits with Invalid AMRMToken after some random number of hours.

[ Edited ]

Hi I'm running storm-yarn on CDH5.3.2 and it's working great, except that it exits seemingly at random after running for between a few hours and a few days.

 

I see an exception in the yarn log, followed by a graceful shutdown:

 

15/04/16 10:58:15 ERROR yarn.MasterServer: Unhandled error in AM:
org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1429008207550_0003_000002
              at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
              at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
              at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
              at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
              at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
              at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
              at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
              at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source)
              at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
              at java.lang.reflect.Method.invoke(Method.java:606)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
              at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
              at com.sun.proxy.$Proxy12.allocate(Unknown Source)
              at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:278)
              at com.yahoo.storm.yarn.MasterServer$1.run(MasterServer.java:69)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1429008207550_0003_000002
              at org.apache.hadoop.ipc.Client.call(Client.java:1411)
              at org.apache.hadoop.ipc.Client.call(Client.java:1364)
              at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
              at com.sun.proxy.$Proxy11.allocate(Unknown Source)
              at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
              ... 8 more
15/04/16 10:58:15 INFO yarn.StormMasterServerHandler: stopping supervisors...
15/04/16 10:58:15 INFO yarn.StormMasterServerHandler: stopping UI...
15/04/16 10:58:15 INFO yarn.StormMasterServerHandler: stopping nimbus...

 

This will be a blocking issue for deploying Cloudera. Other options include Hortonworks who claim to support storm-yarn but we have significant CDH knowledge and don't want to switch, or stand-alone storm which we do have experience with but would rather run a single cluster.

 

Any help would be awesome,

 

Thanks,

James

Highlighted
New Contributor
Posts: 4
Registered: ‎02-09-2015

Re: Long running yarn app (storm-yarn) exits with Invalid AMRMToken after some random number of hour

Today I ran into the same error.  I have a long running spark streaming application and after running for 45 hours it fails with the following error. Any help would be appreciated.

 

Thank you

Raa Thiruvathuru

 

 

 

 

15/04/30 05:31:03 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm87
15/04/30 05:31:03 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1429737482057_30278_000003
15/04/30 05:31:03 INFO retry.RetryInvocationHandler: Exception while invoking allocate of class ApplicationMasterProtocolPBClientImpl over rm87 after 4 fail over attempts. Trying to fail over immediately.
org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1429737482057_30278_000003
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy18.allocate(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:278)
at org.apache.spark.deploy.yarn.YarnAllocationHandler.allocateResources(YarnAllocationHandler.scala:127)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:287)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1429737482057_30278_000003
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.allocate(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
... 9 more
15/04/30 05:31:03 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm349
15/04/30 05:31:03 INFO retry.RetryInvocationHandler: Exception while invoking allocate of class ApplicationMasterProtocolPBClientImpl over rm349 after 5 fail over attempts. Trying to fail over after sleeping for 2471ms.
java.net.ConnectException: Call From sdldalplhdw05.suddenlink.cequel3.com/10.48.210.241 to sdldalplhdm03.suddenlink.cequel3.com:8030 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1415)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.allocate(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy18.allocate(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:278)
at org.apache.spark.deploy.yarn.YarnAllocationHandler.allocateResources(YarnAllocationHandler.scala:127)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:287)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:606)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:700)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463)
at org.apache.hadoop.ipc.Client.call(Client.java:1382)
... 13 more
15/04/30 05:31:05 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm87
15/04/30 05:31:05 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1429737482057_30278_000003
15/04/30 05:31:05 INFO retry.RetryInvocationHandler: Exception while invoking allocate of class ApplicationMasterProtocolPBClientImpl over rm87 after 6 fail over attempts. Trying to fail over immediately.
org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1429737482057_30278_000003
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy18.allocate(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:278)
at org.apache.spark.deploy.yarn.YarnAllocationHandler.allocateResources(YarnAllocationHandler.scala:127)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:287)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1429737482057_30278_000003
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.allocate(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
... 9 more
15/04/30 05:31:05 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm349
15/04/30 05:31:05 INFO retry.RetryInvocationHandler: Exception while invoking allocate of class ApplicationMasterProtocolPBClientImpl over rm349 after 7 fail over attempts. Trying to fail over after sleeping for 1369ms.
java.net.ConnectException: Call From sdldalplhdw05.suddenlink.cequel3.com/10.48.210.241 to sdldalplhdm03.suddenlink.cequel3.com:8030 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1415)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.allocate(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy18.allocate(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:278)
at org.apache.spark.deploy.yarn.YarnAllocationHandler.allocateResources(YarnAllocationHandler.scala:127)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:287)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:606)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:700)
at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463)
at org.apache.hadoop.ipc.Client.call(Client.java:1382)
... 13 more
15/04/30 05:31:07 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm87
15/04/30 05:31:07 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1429737482057_30278_000003
15/04/30 05:31:07 INFO retry.RetryInvocationHandler: Exception while invoking allocate of class ApplicationMasterProtocolPBClientImpl over rm87 after 8 fail over attempts. Trying to fail over immediately.
org.apache.hadoop.security.token.SecretManager$InvalidToken: Invalid AMRMToken from appattempt_1429737482057_30278_000003
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:79)
at sun.reflect.GeneratedMethodAccessor1.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy18.allocate(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.allocate(AMRMClientImpl.java:278)
at org.apache.spark.deploy.yarn.YarnAllocationHandler.allocateResources(YarnAllocationHandler.scala:127)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$3.run(ApplicationMaster.scala:287)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1429737482057_30278_000003
at org.apache.hadoop.ipc.Client.call(Client.java:1411)
at org.apache.hadoop.ipc.Client.call(Client.java:1364)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy17.allocate(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationMasterProtocolPBClientImpl.allocate(ApplicationMasterProtocolPBClientImpl.java:77)
... 9 more

 

 

Explorer
Posts: 16
Registered: ‎09-25-2014

Re: Long running yarn app (storm-yarn) exits with Invalid AMRMToken after some random number of hour

[ Edited ]

I noticed that cdh 5.4 recently released included hadoop 2.6 which in theory supports apache slider, which from what I can tell, is what the guys who made storm yarn are working on now for running storm on yarn. I will suggest our team looks at that, but looks like storm on yarn isn't really ready yet.

Expert Contributor
Posts: 62
Registered: ‎06-03-2014

Re: Long running yarn app (storm-yarn) exits with Invalid AMRMToken after some random number of hour

I have the same problem where Storm jobs running on Yarn exits with an invalid AMRMToken after a day or two.

 

I haven't tested this yet but the CDH 5.4 documentation includes instructions on how to allow long running Yarn jobs to request new tokens. This documents how to configure the ResourceManager as a proxy user for the corresponding HDFS NameNode so that the ResourceManager can request new tokens when the existing ones are past their maximum lifetime. I'll test this and let you know if it resolves the problem:

 

http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_sg_yarn_long_jobs.ht...

 

 

New Contributor
Posts: 2
Registered: ‎07-29-2016

Re: Long running yarn app (storm-yarn) exits with Invalid AMRMToken after some random number of hour

I now have the same problem.  did you eventually solve this, and how ?

thanks

Explorer
Posts: 16
Registered: ‎09-25-2014

Re: Long running yarn app (storm-yarn) exits with Invalid AMRMToken after some random number of hour

sorry I don't know, we switched to using spark streaming instead.

New Contributor
Posts: 1
Registered: ‎08-30-2016

Re: Long running yarn app (storm-yarn) exits with Invalid AMRMToken after some random number of hour

This is also happening on Spark Streaming running on top of Hadoop 2.6.

 

YARN restarts the Spark application automatically, but every now and then, Spark's checkpoint gets corrupted, even though graceful shutdown is enabled. 

 

I don't see any major fixes for AMRM token issues after Hadoop 2.6.0, so I'm not sure this bug is not the effect of something else that is not visible in the spark driver log.

 

Has anyone else encountered this or has any ideas what might be causing it?

 

Using Spark 1.5.1 and Hadoop 2.6.0, cloudera distributions.

New Contributor
Posts: 2
Registered: ‎07-29-2016

Re: Long running yarn app (storm-yarn) exits with Invalid AMRMToken after some random number of hour

[ Edited ]

In order to contribute to the discussion, let me reply to my earlier comment.    In the end, we think we had a network outage. Our Keberos server exists outside our immediate Hadoop network.  When that network went down, our Kerberos server could not be contacted, and then we saw "AMRM token" issues.  We are not 100% sure.  We did see a network problem around the time of the "AMRM token" issues, and there were Java IOExceptions being thrown everywhere.  

 

In summary, we had a network problem.  check for:

1) check for network connection problems to Kerberos

2) check for Java IOExceptions being thrown by other services around the time of the "AMRM token" issues 

 

sorry I could not give a definitive answer

 

Info:  CDH 4.7.0 , PIG scripts