Created 08-10-2017 07:08 PM
Could you remind me whats going on here in this example explain plan. The table contains 611 rows, which I see are being read. Then it appears as the key is not null predicate is applied and the number of rows drops to 306. There are no null fields in this dataset.
How is this pruning data? I would have expected that to be the same as the input size.
Map Operator Tree:
TableScan
alias: a
filterExpr: key is not null (type: boolean)
Statistics: Num rows: 611 Data size: 1833 Basic stats: COMPLETE Column stats: NONE
Filter Operator
predicate: key is not null (type: boolean)
Statistics: Num rows: 306 Data size: 918 Basic stats: COMPLETE Column stats: NONE
Reduce Output Operator
key expressions: key (type: string)
sort order: +
Map-reduce partition columns: key (type: string)
Statistics: Num rows: 306 Data size: 918 Basic stats: COMPLETE Column stats: NONE
Created 08-11-2017 08:02 PM
These numbers (Num rows, Data size) are estimated by Hive (optimizer) and do not represent actual numbers. You can run EXPLAIN + ANALYZE to see both Estimated and Actual numbers.
Created 08-11-2017 08:02 PM
These numbers (Num rows, Data size) are estimated by Hive (optimizer) and do not represent actual numbers. You can run EXPLAIN + ANALYZE to see both Estimated and Actual numbers.
Created 08-16-2017 06:18 PM
Thanks! This is exactly what I was looking for.