Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Performance Reduced after Removing ORDER BY clause

Solved Go to solution

Performance Reduced after Removing ORDER BY clause

Rising Star

Hi dear experts!

 

i trying to play with Impala optimizations and for optimizing datastore want to apply order by construction during insert operation, like:

 

 

create table tab_parquet like tab_text stored as parquet;
insert into tab_parquet select * from tab_text order by col1;

 

but got:

 

An ORDER BY appearing in a view, subquery, union operand, or an insert/ctas statement has no effect on the query result unless a LIMIT and/or OFFSET is used in conjunction with the ORDER BY

 

goes anybody know how to bypass this?

 

PS. i use CDH5.4

# impala-shell --version
Impala Shell v2.2.0-cdh5 (2ffd73a) built on Tue Apr 21 12:09:21 PDT 2015

 

thanks!

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Performance Reduced after Removing ORDER BY clause

Rising Star

None of Impala's supported file formats are able to store data in sorted order on disk.

 

Therefore the ORDER BY clause in the INSERT does not have any effect. The data is written out in a potentially unsorted order regardless.

 

Best,

Henry

View solution in original post

1 REPLY 1
Highlighted

Re: Performance Reduced after Removing ORDER BY clause

Rising Star

None of Impala's supported file formats are able to store data in sorted order on disk.

 

Therefore the ORDER BY clause in the INSERT does not have any effect. The data is written out in a potentially unsorted order regardless.

 

Best,

Henry

View solution in original post

Don't have an account?
Coming from Hortonworks? Activate your account here