Support Questions

Find answers, ask questions, and share your expertise

CodeGen way to slow

avatar
Rising Star

Hi, 
using impala 2.7(8) with cdh5.10.1 here. 
I am trying a simple query : 
`select distinct(date_col_partition) from table_1`

and it is taking 20 sec. 
But When I do a set DISABLE_CODEGEN=true;

It take only less than a second. 

 

here is the profle gist: https://gist.github.com/anonymous/1a5faa3a10d4495f7b8abc3c964457db

 

Any idea of what is going wrong?

 

thanks

1 ACCEPTED SOLUTION

avatar

Thanks for investigating. We've confirmed internally that the issue is related to Avro with many columns. 900 is somewhat wide.

 

Thanks for reporting! We'll continue to look into this issue.

View solution in original post

9 REPLIES 9

avatar

Hi Maurin,

 

thanks for posting, this is pretty interesting. What is the type of your "cuberon_event_date" column?

 

Alex

avatar

As an experiment, it would be interesting to try the query with the same data using a different data format, e.g., text. You can do a quick CREATE TABLE test as SELECT * FROM <original_table> and the retry the query.

avatar
Rising Star

It is a string of that look like "YYYY-MM-DD"
the table is stored as avro. I can try using parquet or text if you want

avatar

Thanks. Trying Parquet would help. Just want to see of the high optimization time in codegen is due to some glitch for Avro.

avatar

Does the table have a lot of columns or anything unusual like that?

avatar
Rising Star

it seems to be coming from avro. 
I created the table as parquet and it took 0.48sec.
The table have about 900 columns, so nothing to fancy.

 

thanks

avatar

Thanks for investigating. We've confirmed internally that the issue is related to Avro with many columns. 900 is somewhat wide.

 

Thanks for reporting! We'll continue to look into this issue.

avatar
Rising Star

thanks!
If you open a jira, can you send me the link?
I will probably disable codegen for now. And wait until you push a fix to re enable it. 
thanks

avatar