Support Questions
Find answers, ask questions, and share your expertise
Alert: The Cloudera Community will undergo maintenance on Saturday, August 17 at 12:00am PDT. See more info here.

Intermittent InvalidAMRMToken eceptions in CASK application

Intermittent InvalidAMRMToken eceptions in CASK application

New Contributor



I am using CDAP, an application framework running on top of  HDFS / YARN / HBase. The cdap framework runs as a long-running yarn application, and individual "flows" launched by the framwork run as their own YARN applications. I am intermittenly getting InvalidAMRMToken exceptions in the application master logs for the individual flow applications. These exceptions are logged for a period of about 10 minutes until the ApplicationMaster eventuall decides to shutdown. An example log snippet is

17:59:50.533 [ApplicationMasterService] WARN - PriviledgedActionException as:yarn (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException($InvalidToken): Invalid AMRMToken from appattempt_1440780893511_3035_000001
17:59:50.583 [ApplicationMasterService] WARN  org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException($InvalidToken): Invalid AMRMToken from appattempt_1440780893511_3035_000001

These exceptions happen intermittently, and only on the child flows, never on the framework application itself. They also can occur after a relatively short period, sometimes only 18 hours, so I don't believe the settings mentioned here are relevant



Any help or suggestions would be appreciated.


Re: Intermittent InvalidAMRMToken eceptions in CASK application

Super Collaborator

Have you tried making the changes suggested?

If they do not work for you let us know but for now I do think that you need to configure the YARN side to support long running applications.



Re: Intermittent InvalidAMRMToken eceptions in CASK application

Master Guru
Also to think about: In MR-land, Oozie has a similar construct, wherein the parent job runs another real working job, and waits on it to complete.

Granted this is not as elaborate as what CASK is doing, one wary-point that Oozie does try to cover is to ensure that the jobs run with mapreduce.job.complete.cancel.delegation.tokens set to false, so the jobs all sharing the same tokens do not end up cancelling/expiring each others' live usage.

The situation within CASK may be similar to this (although the MR properties do not directly apply to it), but I've not studied the CASK implementation to be absolutely certain.

It is certainly worth trying the mentioned settings in the article like Wilfred suggests, which could help with automatic renewals of the tokens and keep things alive. Have you already tried that?

Re: Intermittent InvalidAMRMToken eceptions in CASK application

New Contributor

Hi Wilfred,


I didn't realize this when I first posted, but we apparently did make those changes a while back, and are seeing the problem anyway. 



- Mitch