Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎03-30-2016

Resource Manager Container count not updated, prevents new jobs to start

CDH 5.5.2

 

Since we upgraded our cluster to 5.5.2, occasionnaly the YARN Resource Manager doesn't update its "containers running", "memory used", etc..  count, although no more running applications are listed

 

This prevents new jobs to launch if they require more resources than the *wrongly* available ones, these jobs stay "SUBMITTED" and the cluster is frozen.

 

This happens often after we kill applications, even gracefully, but also sometimes when jobs finish succesfully.

 

I found nothing wrong in the RM logs. Only way I found so far to fix this is to restart the RM service.

 

any ideas ?

thanks

Philippe

Announcements