07-12-2018 04:11 PM
We've few hundred users of our system and we see serious performance degradation when concurrent queries exceeds just mere 10+. I think it might be some configuration we're missing here either in config or the way we're managing the cluster.
AverageScannerThreadConcurrency affecting query performance seriously. The same query when run under bit of load ( just couple of other big queries are running ).
Same query that scan approximately 800 G of data runs fast without load vs super slow ( 20 min ) under load.
AverageScannerThreadConcurrency: 28.664101859697162 (fast)
AverageScannerThreadConcurrency: 1.5204863450230135 (slow)
Any suggestions on how can we we infludence scanner thread concurrency to improve HDFS scan ?