12-05-2017 08:19 AM - edited 12-05-2017 08:34 AM
it all depends on what you are looking for throughput or latency , query fault tolerant ?
sparksql is fault tolerant , impala know for low latency.
use impala for exploratory analytics on large data sets .
impala is not fault tolerant meaning if the query runining on that machine goes down the query has to be re-run. however in our enviroment large cluster we hardly have this issue .