CDH 5.5.2
Since we upgraded our cluster to 5.5.2, occasionnaly the YARN Resource Manager doesn't update its "containers running", "memory used", etc.. count, although no more running applications are listed
This prevents new jobs to launch if they require more resources than the *wrongly* available ones, these jobs stay "SUBMITTED" and the cluster is frozen.
This happens often after we kill applications, even gracefully, but also sometimes when jobs finish succesfully.
I found nothing wrong in the RM logs. Only way I found so far to fix this is to restart the RM service.
any ideas ?
thanks
Philippe