Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here. Want to know more about what has changed? Check out the Community News blog.

SparkSQL vs Impala: which one should I use

Highlighted

SparkSQL vs Impala: which one should I use

Explorer

Is there any benefit to query HDFS data using SparkSQL instead of Impala in a pyspark code?

1 REPLY 1

Re: SparkSQL vs Impala: which one should I use

Champion

it all depends on what you are looking for throughput or latency , query fault tolerant ?  

sparksql  is fault tolerant , impala know for low latency. 

use impala for exploratory analytics on large data sets . 

impala is not fault tolerant meaning if the query runining on that machine goes down the query has to be re-run. however in our enviroment large cluster  we hardly have this issue .