Has anyone got any ideas about where I need to look to resolve this?
It feels like the jobs being handed to one box that can do it and other that can't and when it goes to the one that can't it gets handed out again and again till it hits the one that can. I'm not sure where to look to identify where this point of failure is though so I can either remove that node or correct it.
I've been looking through oozie and the map tasks that fail all run on the same nodes, the ones that dont' run on a different set of nodes but there is no cross over. Clearly something on one set of nodes differs from the other but I can't see what. I've tried to cycle YARN on all nodes with no gain.