Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CodeGen way to slow

Solved Go to solution

CodeGen way to slow

Contributor

Hi, 
using impala 2.7(8) with cdh5.10.1 here. 
I am trying a simple query : 
`select distinct(date_col_partition) from table_1`

and it is taking 20 sec. 
But When I do a set DISABLE_CODEGEN=true;

It take only less than a second. 

 

here is the profle gist: https://gist.github.com/anonymous/1a5faa3a10d4495f7b8abc3c964457db

 

Any idea of what is going wrong?

 

thanks

1 ACCEPTED SOLUTION

Accepted Solutions

Re: CodeGen way to slow

Master Collaborator

Thanks for investigating. We've confirmed internally that the issue is related to Avro with many columns. 900 is somewhat wide.

 

Thanks for reporting! We'll continue to look into this issue.

9 REPLIES 9

Re: CodeGen way to slow

Master Collaborator

Hi Maurin,

 

thanks for posting, this is pretty interesting. What is the type of your "cuberon_event_date" column?

 

Alex

Re: CodeGen way to slow

Master Collaborator

As an experiment, it would be interesting to try the query with the same data using a different data format, e.g., text. You can do a quick CREATE TABLE test as SELECT * FROM <original_table> and the retry the query.

Re: CodeGen way to slow

Contributor

It is a string of that look like "YYYY-MM-DD"
the table is stored as avro. I can try using parquet or text if you want

Highlighted

Re: CodeGen way to slow

Master Collaborator

Thanks. Trying Parquet would help. Just want to see of the high optimization time in codegen is due to some glitch for Avro.

Re: CodeGen way to slow

Master Collaborator

Does the table have a lot of columns or anything unusual like that?

Re: CodeGen way to slow

Contributor

it seems to be coming from avro. 
I created the table as parquet and it took 0.48sec.
The table have about 900 columns, so nothing to fancy.

 

thanks

Re: CodeGen way to slow

Master Collaborator

Thanks for investigating. We've confirmed internally that the issue is related to Avro with many columns. 900 is somewhat wide.

 

Thanks for reporting! We'll continue to look into this issue.

Re: CodeGen way to slow

Contributor

thanks!
If you open a jira, can you send me the link?
I will probably disable codegen for now. And wait until you push a fix to re enable it. 
thanks

Re: CodeGen way to slow

Master Collaborator
Don't have an account?
Coming from Hortonworks? Activate your account here