I have two separate Hadoop clusters, Cloudera Hadoop cluster and Apache Hadoop cluster.
Found that Impala query runs faster on cloudera whereas same query runs slower in Apache Hadoop cluster.
During query execution found that query taking significant amount of time in analyzing and Planning phase compared to Cloudera cluster.
I tuned up Apache cluster for heap size configuration and try to maintain same property and it’s values as I have in Cloudera Cluster.
What else I need to double check or need to configure some other services, configurations?
Please suggest.
Same machined hardware configuration and same instances were used in both clusters.
Versions I used in Cloudera
CDH 6.3.2
impalad version 3.2.0
Versions I used in Apache
Hadoop 3.0.0
Impala 3.4.0