Support Questions
Find answers, ask questions, and share your expertise

What to use? When to use?

What to use? When to use?


01. Apache Hive (Spark as Execution Engine) -> Apache Elasticsearch

02. Apache Impala -> Apache Kudu

03. Apache Phoenix -> Apache HBase


Re: What to use? When to use?

Super Collaborator
In fact the main question is what you want to achieve. You will have to decide on the storage engine, and get your query interface fitting the storage engine decision.
Elasticsearch: implements a document DB, storing jsons, with support for indexes and powerful queries. The typical feature that brings people to Elasticsearch is the powerful handling of indexes and queries on the data.
Kudu: implements a SQL like storage on parallel nodes, though it is not implementing the SQL standard. Is nice to store structured data from other SQL sources, and provides fast random access via indexes.
HBase: implementes the BigTable concept, basically storing table column oriented. This is a nice concept if you have millions of rows with thousends of columns, and many columns are weakly filled (so typically no 'normalized' data). It allows very fast inserts and brings a versioning within the table, the access is best when always possible via one index.
All are storing the data on parallel node to allow scaling with the number of nodes, but have different approaches. And for most tasks there are solutions possible in all applications. But it might differ in performance and effort needed to use it.
The features and differences are of course much more, but maybe you get a first idea on the concepts.