Support Questions

smartninja723 · ‎08-22-2017

Hi,

I understand that mechanism of Hive(with Tez) and Hive (with MR) is different from traditional RDBMS databases. We have a set of analysts who perform : "select * from view limit n" kind of queries many times.

Since all analysts/BI users come from traditional RBDMS background, users do compare the waiting time for RDBMS and for Hive Query to return results.

For example :

select top 10 * from db.view ;

using SQL server and on a much larger dataset, takes the following times to complete:

run 1: 0 seconds
run 2: 0 seconds
run 3: 0 seconds ...........

When running the same query through Hive over Knox (or even with beeline), it takes much higher time.

SELECT * FROM db.view limit 10
Takes the following times to complete via hive over Knox or via Ambari View or with beeline.

run 1: 36 seconds
run 2: 18 seconds
run 3: 38 seconds .......

This is one example of a db/table combination, but this is a common scenario for mostly all the tables in a few databases.

I tried analyze and compute statistics on underlying tables on which these queries are run, but query times did not change.

I understand that, we are not comparing apple to apple here, but this question is more to do with improvement of end user experience, and how best can we help to avoid long wait times?

(This is on HDP 2.6.x)

Regards,

SS

sweeks · ‎08-22-2017

What does the query look like in the "db.view"? How many joins, formats of underlying table, where clauses etc. If you're just doing a straight "select * from table limit 10" I'd expect it to return almost immediately but any kind of processing will take a few seconds. Have you tried LLAP/Interactive Queries?

Cloudera Community

Support Questions

Speeding up : select "n" rows from a view (select * from view limit n) without any filter condition.