Support Questions

Find answers, ask questions, and share your expertise

Performance Reduced after Removing ORDER BY clause

avatar
Rising Star

Hi dear experts!

 

i trying to play with Impala optimizations and for optimizing datastore want to apply order by construction during insert operation, like:

 

 

create table tab_parquet like tab_text stored as parquet;
insert into tab_parquet select * from tab_text order by col1;

 

but got:

 

An ORDER BY appearing in a view, subquery, union operand, or an insert/ctas statement has no effect on the query result unless a LIMIT and/or OFFSET is used in conjunction with the ORDER BY

 

goes anybody know how to bypass this?

 

PS. i use CDH5.4

# impala-shell --version
Impala Shell v2.2.0-cdh5 (2ffd73a) built on Tue Apr 21 12:09:21 PDT 2015

 

thanks!

1 ACCEPTED SOLUTION

avatar
Expert Contributor

None of Impala's supported file formats are able to store data in sorted order on disk.

 

Therefore the ORDER BY clause in the INSERT does not have any effect. The data is written out in a potentially unsorted order regardless.

 

Best,

Henry

View solution in original post

1 REPLY 1

avatar
Expert Contributor

None of Impala's supported file formats are able to store data in sorted order on disk.

 

Therefore the ORDER BY clause in the INSERT does not have any effect. The data is written out in a potentially unsorted order regardless.

 

Best,

Henry