I am using CDAP, an application framework running on top of HDFS / YARN / HBase. The cdap framework runs as a long-running yarn application, and individual "flows" launched by the framwork run as their own YARN applications. I am intermittenly getting InvalidAMRMToken exceptions in the application master logs for the individual flow applications. These exceptions are logged for a period of about 10 minutes until the ApplicationMaster eventuall decides to shutdown. An example log snippet is
17:59:50.533 [ApplicationMasterService] WARN o.a.h.security.UserGroupInformation - PriviledgedActionException as:yarn (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1440780893511_3035_000001 17:59:50.583 [ApplicationMasterService] WARN org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1440780893511_3035_000001
These exceptions happen intermittently, and only on the child flows, never on the framework application itself. They also can occur after a relatively short period, sometimes only 18 hours, so I don't believe the settings mentioned here are relevant http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_sg_yarn_long_jobs.ht...
Any help or suggestions would be appreciated.
Have you tried making the changes suggested?
If they do not work for you let us know but for now I do think that you need to configure the YARN side to support long running applications.
I didn't realize this when I first posted, but we apparently did make those changes a while back, and are seeing the problem anyway.