Member since
01-05-2016
4
Posts
1
Kudos Received
0
Solutions
01-08-2016
10:28 PM
I tried "set hive.limit.pushdown.memory.usage=0.3;" and running the using Rank but it still take 800s. Thanks for taking up this ticket so quickly! Once you finish your patch will it be available on HIVE-12808? If so, would it be possible to clone the Hive repo and apply your patch and compile and test it?
... View more
01-07-2016
10:42 AM
I've tried this and it still doesn't seem to push down into the Windowing Function. I'm still getting the full table scan at the beginning of my execution plan.
... View more
01-06-2016
08:14 AM
hive> set explain_level=extended;
hive> explain SELECT * FROM latestposition WHERE regionid='1d6a0be1-6366-4692-9597-ebd5cd0f01d1' and id=1422792010 and deviceid='6c5d1a30-2331-448b-a726-a380d6b3a432';
OK
Plan not optimized by CBO.
Vertex dependency in root stage
Reducer 2 <- Map 1 (SIMPLE_EDGE)
Stage-0
Fetch Operator
limit:-1
Stage-1
Reducer 2
File Output Operator [FS_10]
compressed:false
Statistics:Num rows: 3 Data size: 3264 Basic stats: COMPLETE Column stats: COMPLETE
table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
Select Operator [SEL_4]
outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
Statistics:Num rows: 3 Data size: 3264 Basic stats: COMPLETE Column stats: COMPLETE
Filter Operator [FIL_11]
predicate:((((ROW_NUMBER_window_0 = 1) and (_col9 = '1d6a0be1-6366-4692-9597-ebd5cd0f01d1')) and (_col10 = 1422792010)) and (_col0 = '6c5d1a30-2331-448b-a726-a380d6b3a432')) (type: boolean)
Statistics:Num rows: 3 Data size: 3264 Basic stats: COMPLETE Column stats: COMPLETE
PTF Operator [PTF_3]
Function definitions:[{"Input definition":{"type:":"WINDOWING"}},{"order by:":"_col10(DESC)","name:":"windowingtablefunction","partition by:":"_col9, _col10, _col0"}]
Statistics:Num rows: 19535311 Data size: 21254418368 Basic stats: COMPLETE Column stats: COMPLETE
Select Operator [SEL_2]
| outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10"]
| Statistics:Num rows: 19535311 Data size: 21254418368 Basic stats: COMPLETE Column stats: COMPLETE
|<-Map 1 [SIMPLE_EDGE] vectorized
Reduce Output Operator [RS_12]
key expressions:regionid (type: string), id (type: int), deviceid (type: string)
Map-reduce partition columns:regionid (type: string), id (type: int), deviceid (type: string)
sort order:+-+
Statistics:Num rows: 19535311 Data size: 17581916530 Basic stats: COMPLETE Column stats: COMPLETE
value expressions:datetime (type: string), radius (type: string), x (type: string), y (type: string), classifier (type: string), categoryid (type: string), accountid (type: string), mapid (type: string)
TableScan [TS_0]
alias:pos_regions_orc2
Statistics:Num rows: 19535311 Data size: 17581916530 Basic stats: COMPLETE Column stats: COMPLETE
Time taken: 0.774 seconds, Fetched: 37 row(s)
... View more
01-05-2016
10:10 PM
1 Kudo
Using Hive on Tez running this query against this view causes a full table scan even though there is a Partition on regionid and id. This query in Cloudera Impala takes 0.6s to complete and using Hortonworks Data Platform and Hive on Tez it takes 800s. I've come to the conclusion that in Hive on Tez using a window function prevents the predicate to be pushed down to the inner select causing the full table scan. CREATE VIEW latestposition AS
WITH t1 AS (
SELECT *, ROW_NUMBER() OVER ( PARTITION BY regionid, id, deviceid order by ts desc) AS rownos FROM positions
)
SELECT *
FROM t1
WHERE rownos = 1;
SELECT * FROM latestposition WHERE regionid='1d6a0be1-6366-4692-9597-ebd5cd0f01d1' and id=1422792010 and deviceid='6c5d1a30-2331-448b-a726-a380d6b3a432';
I've tried joining this table to itself using the MAX function to get the latest record, it works and finishes in a few seconds but it still is too slow for my use case. Also if I remove the window function the predicate gets pushed down and this will return in milliseconds.
If anyone has any ideas it would be much appreciated.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Tez