I have two separate Hadoop clusters, Cloudera Hadoop cluster and Apache Hadoop cluster.
Found that Impala query runs faster on cloudera whereas same query runs slower in Apache Hadoop cluster.
During query execution found that query taking significant amount of time in analyzing and Planning phase compared to Cloudera cluster.
I tuned up Apache cluster for heap size configuration and try to maintain same property and it’s values as I have in Cloudera Cluster.
What else I need to double check or need to configure some other services, configurations?
Same machined hardware configuration and same instances were used in both clusters.
Versions I used in Cloudera
impalad version 3.2.0
Versions I used in Apache
@manjj To understand the performance difference between two Cloudera Hadoop cluster and Apache Hadoop cluster. I would suggest you to collect the Impala_query profile from both cluster and compare it. To analysis the Impala query profile. Please use the below articles. https://docs.cloudera.com/runtime/7.2.9/impala-reference/topics/impala-profile.htmlhttps://conferences.oreilly.com/strata/strata-ca-2018/cdn.oreillystatic.com/en/assets/1/event/269/Ho...https://cloudera.ericlin.me/2018/09/impala-query-profile-explained-part-1/Cheers!Was your question answered? Make sure to mark the answer as the accepted solution.If you find a reply useful, say thanks by clicking on the thumbs up button.