Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Using impala to qurrey from Kudu or S3

Highlighted

Using impala to qurrey from Kudu or S3

Explorer

HI

I am wondering what would be the performance difference of using impala to query data from S3 or kudu table. I presume kudu table store its data on HDFS which I think it is more expensive comparing storing data on S3.. So based on these, what would be the best option?

 

1 REPLY 1

Re: Using impala to qurrey from Kudu or S3

Contributor

Kudu actually doesn't store its data on HDFS, it's built to be completely independent of HDFS. That said, the "best" option depends largely on your use case.

 

 

As far as I know, S3 doesn't have many optimizations for updating data or for indexing brand new data coming in. This is exactly what Kudu is designed for, and if that's something that you don't need for your use-case, then S3 is likely the winner.

Don't have an account?
Coming from Hortonworks? Activate your account here