Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

performance for views with union of different schema

avatar
Rising Star

Hi,

 

quick question on performance, if I have 2 tables, the first one with columns "a,b" and the second one with columns "c,d" and I create a view like the following : 

CREATE VIEW my_view AS (
select a,b,null,null from table_1 
union 
select null,null,c,d from table_2)

Now if I do a simple query like :

select a from my_view

Will the query only read from table 1 or the entire table_2 will also be scanned? 
(I am mainly worried about disk reads)

 

Thanks

1 ACCEPTED SOLUTION

avatar

Hi Maurin,

 

both tables have to be scanned to observe SQL semantics. Otherwise, we would be changing the number of results coming out of your view. If you want the drop the second union operand, you could add a "WHERE a IS NOT NULL", and then the seocnd table will not be scanned.

 

Alex

View solution in original post

2 REPLIES 2

avatar
Super Collaborator

Hi maurin,

 

you should be able to tell from the query profile. Run the query and then immediately after run "profile;" in the Impala shell to display the profile information, which will also contain information about the table scans. Feel free to post the profile here if you need help inspecting it.

 

Cheers, Lars

avatar

Hi Maurin,

 

both tables have to be scanned to observe SQL semantics. Otherwise, we would be changing the number of results coming out of your view. If you want the drop the second union operand, you could add a "WHERE a IS NOT NULL", and then the seocnd table will not be scanned.

 

Alex