Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

SparkSQL vs Impala: which one should I use

SparkSQL vs Impala: which one should I use

Explorer

Is there any benefit to query HDFS data using SparkSQL instead of Impala in a pyspark code?

1 REPLY 1
Highlighted

Re: SparkSQL vs Impala: which one should I use

Champion

it all depends on what you are looking for throughput or latency , query fault tolerant ?  

sparksql  is fault tolerant , impala know for low latency. 

use impala for exploratory analytics on large data sets . 

impala is not fault tolerant meaning if the query runining on that machine goes down the query has to be re-run. however in our enviroment large cluster  we hardly have this issue . 

Don't have an account?
Coming from Hortonworks? Activate your account here