I don't know if these two issues are related, but I can't seem to keep both running at the same time.
If I start TL server and RM, both will crash. If I just start RM, it'll run for about 30 minutes, then crash. If I just start TL server, it'll keep running until I try to start RM, then both will crash eventually.
Here's the error log when TL server doesn't start:
2018-05-30 16:17:43,163 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1526956266937_2575 State change from NEW to KILLED
2018-05-30 16:17:43,163 WARN rmapp.RMAppImpl (RMAppImpl.java:<init>(423)) - The specific max attempts: 0 for application: 2576 is invalid, because it is out of the range [1, 2]. Use the global max attempts instead.
2018-05-30 16:17:43,163 INFO rmapp.RMAppImpl (RMAppImpl.java:recover(792)) - Recovering app: application_1526956266937_2576 with 1 attempts and final state = KILLED
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 'ambari-sudo.sh -H -E test -f /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid && ambari-sudo.sh -H -E pgrep -F /var/run/hadoop-yarn/yarn/yarn-yarn-timelineserver.pid' returned 1.
And here's what a small snippet of what I think is the problem in the yarn-yarn-resourcemanager<server>.log file
2018-05-30 16:17:48,159 INFO rmapp.RMAppImpl (RMAppImpl.java:handle(779)) - application_1527213246626_10169 State change from NEW to FAILED
2018-05-30 16:17:48,159 INFO rmapp.RMAppImpl (RMAppImpl.java:recover(792)) - Recovering app: application_1527213246626_10170 with 0 attempts and final state = FAILED