Reply
Highlighted
New Contributor
Posts: 3
Registered: ‎09-20-2015

Intermittent InvalidAMRMToken eceptions in CASK application

Hi,

 

I am using CDAP, an application framework running on top of  HDFS / YARN / HBase. The cdap framework runs as a long-running yarn application, and individual "flows" launched by the framwork run as their own YARN applications. I am intermittenly getting InvalidAMRMToken exceptions in the application master logs for the individual flow applications. These exceptions are logged for a period of about 10 minutes until the ApplicationMaster eventuall decides to shutdown. An example log snippet is

17:59:50.533 [ApplicationMasterService] WARN  o.a.h.security.UserGroupInformation - PriviledgedActionException as:yarn (auth:SIMPLE) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1440780893511_3035_000001
17:59:50.583 [ApplicationMasterService] WARN  org.apache.hadoop.ipc.Client - Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): Invalid AMRMToken from appattempt_1440780893511_3035_000001

These exceptions happen intermittently, and only on the child flows, never on the framework application itself. They also can occur after a relatively short period, sometimes only 18 hours, so I don't believe the settings mentioned here are relevant http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cm_sg_yarn_long_jobs.ht...

 

 

Any help or suggestions would be appreciated.

Cloudera Employee
Posts: 322
Registered: ‎01-16-2014

Re: Intermittent InvalidAMRMToken eceptions in CASK application

Have you tried making the changes suggested?

If they do not work for you let us know but for now I do think that you need to configure the YARN side to support long running applications.

 

Wilfred

Posts: 1,903
Kudos: 435
Solutions: 307
Registered: ‎07-31-2013

Re: Intermittent InvalidAMRMToken eceptions in CASK application

Also to think about: In MR-land, Oozie has a similar construct, wherein the parent job runs another real working job, and waits on it to complete.

Granted this is not as elaborate as what CASK is doing, one wary-point that Oozie does try to cover is to ensure that the jobs run with mapreduce.job.complete.cancel.delegation.tokens set to false, so the jobs all sharing the same tokens do not end up cancelling/expiring each others' live usage.

The situation within CASK may be similar to this (although the MR properties do not directly apply to it), but I've not studied the CASK implementation to be absolutely certain.

It is certainly worth trying the mentioned settings in the article like Wilfred suggests, which could help with automatic renewals of the tokens and keep things alive. Have you already tried that?
New Contributor
Posts: 3
Registered: ‎09-20-2015

Re: Intermittent InvalidAMRMToken eceptions in CASK application

Hi Wilfred,

 

I didn't realize this when I first posted, but we apparently did make those changes a while back, and are seeing the problem anyway. 

 

Thanks,

- Mitch