One of the NodeManagers crashed and when i have looked at the application master i see that one task is killed but the Applicatiom master didn't start another task instead, also i see that the same task in pending state with the same id, when i'm trying to kill the task using mapred job -kill-task, i got it's killed but the UI still show it's pending.
The mapreduce.task.timeout is 10 minutes but the task was hanging for 30 minutes.
What i'm missing? how i can reenforce it to start the task on another node without killing the whole job, i'm using CDH5.5.4
The fact that a NM crashes does not mean that the containers on the node also crash. In CDH 5.5 and later you also have work recovery on the NM turned on. That means that a NM can be restarted without the containers being taken down and the NM after a restart will pick up the containers that are there. Status might not update until the NM is started again because the container communicates with the NM for that.
Did you restart the NM after it crashed?
Can you also explain where you saw that the task was hanging?
The NM crashed and we didn't take it UP since it was needed a smart hand so we decide to wait with it.
I cann't restart the NM since the node was down and not reachable.
In the Yarn UI, when i clicked in the application master Url where it shows me the status of the mappers and reducers, i saw that 1 mapper was in pending status and that task was one NM that crashed.
@Wilfred Suppose we run the command shutdown -h now on the whole server, how the tasks that were on the NM on this node will be managed? assuning the node will be remain down.
Is there a way to enforce these tasks to move to another NM without killing the whole job since as i stated when i tried to killed this specific task, the CLI shows me the task killed but the in the application master i was still see that there is a pending mapper.
Can someone please advise what is the right flow for the containers on a crashed node, should the containers start on another node or all the application should be killed?
suppose i want to take the node down for few hours, what is the right steps to be taken so i can gurantee the containers will start on another node and will not be in pending state till the node started?
For normal maintenance: you decommission the node, or just the NM from the node which will remove it from the cluster and also make sure the RM is updated and the containers are shutdown.
If you kill the container, i.e. the java process, that runs then the AM will time out the container, marked it as failed and start it on another node.
If the node is completely of the network the delay for noticing the fact that the node and the container is gone is far longer. We fixed that via HADOOP-11252. However we did not turn that on by default. The timeout is infinite and we fall back to the TCP timeouts which can be really long in these cases.
@Wilfred Hi Wilfred,
As you can see from the screen shoot, that the attempt discoverd as lost and that the elaspsed time is increasing even the atempt is killed.
You can see from the tasks also that no progress, while i'm trying to run the kill commnad, it consider it as killling attemp.
Are you sure that the node (i.e. the HW had crashed) and was no longer reachable?
It looks like the NM had crashed and the HW was still up and running and thus the container was still up and running. If you use YARN to kill a task attempt you need the NM to be up and running because the NM handles the container kill and cleanup. If the container does not get told to exit it will sit there and do its work until it is done.
The RM removed the container from all its house keeping details there is no guarantee that the container is also removed from the node in this case.
Did you use the mapred or yarn command to kill the attempt?
Have you collected the container log and checked what was going on?
Did you check the application master to see what it thought the attempt was still doing?