My cluster seems to be having a problem with the App Timeline Server (ATS) and Resource Manager running at the same time.
As a test, I stopped both services, cleared out the log files, and just started RM. Ambari shows RM has started fine, but...
Here's what I found in the logs.
2018-05-31 15:57:42,336 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1527213246626_3635 State change from NEW to FAILED 2018-05-31 15:57:42,336 WARN rmapp.RMAppImpl (RMAppImpl.java:<init>(423)) - The specific max attempts: 0 for application: 3636 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead. 2018-05-31 15:57:42,336 INFO rmapp.RMAppImpl (RMAppImpl.java:recover(792)) - Recovering app: application_1527213246626_3636 with 0 attempts and final state = FAILED
2018-05-31 15:57:49,435 WARN resourcemanager.RMAuditLogger: USER=dr.who OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=org.apache.hadoop.security.AccessControlException: Queue root.default already has 1 applications, cannot accept submission of application: application_1527213246626_3508 APPID=application_1527213246626_3508
What's going on here?
@Mike Wong ambari is correct when saying RM is running. RM will print information about applications running, being submitted or failing on the cluster among other things. It seems you need to check your yarn scheduler configuration as it seems your default queue will allow only 1 application to run at the time.
Queue root.default already has 1 applications, cannot accept submission of application
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
@Mike Wong I suggest you read over:
and I also recommend you check the yarn.scheduler.minimum-allocation-mb (in case is too big), and yarn.scheduler.maximum-allocation-mb (in case is too small) - A good read here
@Mike Wong Yes, the default queue has no capacity. All capacity is assigned to queue llap. Usually when you submit an application without specifying the queue it will go to default queue. Since it does not have any capacity you run into the above problem. I suggest you give the default queue some capacity and take some from llap. Set default to 30 or so and llap to 70 - To move forward.
Right now, yarn.resourcemanager.scheduler.monitor.enable is set to false.
Looks like there are few settings I'll have to add to yarn-site.xml:
I'll get to it...