Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Impala query taking longer time in analyzing and planning.

avatar
Contributor

I have two separate Hadoop clusters, Cloudera Hadoop cluster and Apache Hadoop cluster.

Found that Impala query runs faster on cloudera whereas same query runs slower in Apache Hadoop cluster.

During query execution found that query taking significant amount of time in analyzing and Planning phase compared to Cloudera cluster.

I tuned up Apache cluster for heap size configuration and try to maintain same property and it’s values as I have in Cloudera Cluster.

 

What else I need to double check or need to configure some other services, configurations?

Please suggest.

 

Same machined hardware configuration and same instances were used in both clusters.

Versions I used in Cloudera

CDH 6.3.2

impalad version 3.2.0

 

Versions I used in Apache

Hadoop 3.0.0

Impala 3.4.0

 

1 REPLY 1

avatar
Contributor

@manjj To understand the performance difference between two Cloudera Hadoop cluster and Apache Hadoop cluster. I would suggest you to collect the Impala_query profile from both cluster and compare it. 

To analysis the Impala query profile. Please use the below articles. 

https://docs.cloudera.com/runtime/7.2.9/impala-reference/topics/impala-profile.html
https://conferences.oreilly.com/strata/strata-ca-2018/cdn.oreillystatic.com/en/assets/1/event/269/Ho...
https://cloudera.ericlin.me/2018/09/impala-query-profile-explained-part-1/

Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.