Reply
New Contributor
Posts: 2
Registered: ‎12-05-2017

SparkSQL vs Impala: which one should I use

[ Edited ]

Is there any benefit to query HDFS data using SparkSQL instead of Impala in a pyspark code?

Highlighted
Champion
Posts: 600
Registered: ‎05-16-2016

Re: SparkSQL vs Impala: which one should I use

[ Edited ]

it all depends on what you are looking for throughput or latency , query fault tolerant ?  

sparksql  is fault tolerant , impala know for low latency. 

use impala for exploratory analytics on large data sets . 

impala is not fault tolerant meaning if the query runining on that machine goes down the query has to be re-run. however in our enviroment large cluster  we hardly have this issue . 

Announcements