Reply
Contributor
Posts: 35
Registered: ‎04-07-2016
Accepted Solution

performance for views with union of different schema

Hi,

 

quick question on performance, if I have 2 tables, the first one with columns "a,b" and the second one with columns "c,d" and I create a view like the following : 

CREATE VIEW my_view AS (
select a,b,null,null from table_1 
union 
select null,null,c,d from table_2)

Now if I do a simple query like :

select a from my_view

Will the query only read from table 1 or the entire table_2 will also be scanned? 
(I am mainly worried about disk reads)

 

Thanks

Highlighted
Cloudera Employee
Posts: 82
Registered: ‎12-07-2015

Re: performance for views with union of different schema

Hi maurin,

 

you should be able to tell from the query profile. Run the query and then immediately after run "profile;" in the Impala shell to display the profile information, which will also contain information about the table scans. Feel free to post the profile here if you need help inspecting it.

 

Cheers, Lars

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: performance for views with union of different schema

Hi Maurin,

 

both tables have to be scanned to observe SQL semantics. Otherwise, we would be changing the number of results coming out of your view. If you want the drop the second union operand, you could add a "WHERE a IS NOT NULL", and then the seocnd table will not be scanned.

 

Alex

Announcements