Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Which one is best Hive vs Impala vs Drill vs Kudu, in combination with Spark SQL?

avatar
Contributor
 
1 REPLY 1

avatar

Assuming you want to access the data via spark, then the main question is how it should be stored.

 

For this Drill is not supported, but Hive tables and Kudu are supported by Cloudera.

 

Now it boils down to whether you want to store the data in Hive or in Kudu, as Spark can work with both of these. 

 

If you want to insert your data record by record, or want to do interactive queries in Impala then Kudu is likely the best choice.

 

If you want to insert and process your data in bulk, then Hive tables are usually the nice fit.


- Dennis Jaheruddin

If this answer helped, please mark it as 'solved' and/or if it is valuable for future readers please apply 'kudos'.