I understand that mechanism of Hive(with Tez) and Hive (with MR) is different from traditional RDBMS databases. We have a set of analysts who perform : "select * from view limit n" kind of queries many times.
Since all analysts/BI users come from traditional RBDMS background, users do compare the waiting time for RDBMS and for Hive Query to return results.
For example :
select top 10 * from db.view ;
using SQL server and on a much larger dataset, takes the following times to complete:
run 1: 0 seconds run 2: 0 seconds run 3: 0 seconds ...........
When running the same query through Hive over Knox (or even with beeline), it takes much higher time.
SELECT * FROM db.view limit 10 Takes the following times to complete via hive over Knox or via Ambari View or with beeline.
run 1: 36 seconds run 2: 18 seconds run 3: 38 seconds .......
This is one example of a db/table combination, but this is a common scenario for mostly all the tables in a few databases.
I tried analyze and compute statistics on underlying tables on which these queries are run, but query times did not change.
I understand that, we are not comparing apple to apple here, but this question is more to do with improvement of end user experience, and how best can we help to avoid long wait times?
What does the query look like in the "db.view"? How many joins, formats of underlying table, where clauses etc. If you're just doing a straight "select * from table limit 10" I'd expect it to return almost immediately but any kind of processing will take a few seconds. Have you tried LLAP/Interactive Queries?