About james_kuo

sergey · ‎12-12-2017

Yeah I suspect this is where cluster is reaching capacity. Killed task attempts are probably composed of two types: rejected task attempts because LLAP daemon is full and won't accept new work; and killed opportunistic non-finishable tasks (preemption). The latter happen because Hive starts some tasks (esp. reducers) before all inputs for them are ready, to be able to download the inputs from some upstream tasks while waiting for other upstream tasks to finish. When parallel queries want to run a task that can run and finish immediately, they would pre-empt non-finishable tasks (otherwise a task that is potentially doing nothing and waiting for something else to finish could take resouces from the tasks that are ready). This is normal with high volume concurrent queries that amount of preemption increases. The only way to check if there are any other (potentially problematic) kills now is to check the logs... If cache is not as important for these queries you can try to reducehive.llap.task.scheduler.locality.delay, which may cause faster scheduling for tasks (-1 means infinite, the minimum otherwise is 0).. However, once the cluster is at capacity, it's not realistic to expect sub-linear scaling... individual query runtime improvements would also improve aggregate runtime in this case.

Online	Offline
Last Visited	‎12-18-2017 10:58 PM

Member Since	‎12-06-2017 07:16 PM
Last Visited	‎12-18-2017 10:58 PM
Posts	2

Cloudera Community

Re: LLAP concurrency performance question