Member since
12-06-2017
2
Posts
0
Kudos Received
0
Solutions
12-12-2017
01:11 AM
Yeah I suspect this is where cluster is reaching capacity. Killed task attempts are probably composed of two types: rejected task attempts because LLAP daemon is full and won't accept new work; and killed opportunistic non-finishable tasks (preemption). The latter happen because Hive starts some tasks (esp. reducers) before all inputs for them are ready, to be able to download the inputs from some upstream tasks while waiting for other upstream tasks to finish. When parallel queries want to run a task that can run and finish immediately, they would pre-empt non-finishable tasks (otherwise a task that is potentially doing nothing and waiting for something else to finish could take resouces from the tasks that are ready). This is normal with high volume concurrent queries that amount of preemption increases. The only way to check if there are any other (potentially problematic) kills now is to check the logs... If cache is not as important for these queries you can try to reducehive.llap.task.scheduler.locality.delay, which may cause faster scheduling for tasks (-1 means infinite, the minimum otherwise is 0).. However, once the cluster is at capacity, it's not realistic to expect
sub-linear scaling... individual query runtime improvements would also
improve aggregate runtime in this case.
... View more