Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Performance Reduced after Removing ORDER BY clause

avatar
Rising Star

Hi dear experts!

 

i trying to play with Impala optimizations and for optimizing datastore want to apply order by construction during insert operation, like:

 

 

create table tab_parquet like tab_text stored as parquet;
insert into tab_parquet select * from tab_text order by col1;

 

but got:

 

An ORDER BY appearing in a view, subquery, union operand, or an insert/ctas statement has no effect on the query result unless a LIMIT and/or OFFSET is used in conjunction with the ORDER BY

 

goes anybody know how to bypass this?

 

PS. i use CDH5.4

# impala-shell --version
Impala Shell v2.2.0-cdh5 (2ffd73a) built on Tue Apr 21 12:09:21 PDT 2015

 

thanks!

1 ACCEPTED SOLUTION

avatar
Expert Contributor

None of Impala's supported file formats are able to store data in sorted order on disk.

 

Therefore the ORDER BY clause in the INSERT does not have any effect. The data is written out in a potentially unsorted order regardless.

 

Best,

Henry

View solution in original post

1 REPLY 1

avatar
Expert Contributor

None of Impala's supported file formats are able to store data in sorted order on disk.

 

Therefore the ORDER BY clause in the INSERT does not have any effect. The data is written out in a potentially unsorted order regardless.

 

Best,

Henry