- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Impala query taking longer time in analyzing and planning.
- Labels:
-
Apache Impala
Created ‎08-29-2021 10:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have two separate Hadoop clusters, Cloudera Hadoop cluster and Apache Hadoop cluster.
Found that Impala query runs faster on cloudera whereas same query runs slower in Apache Hadoop cluster.
During query execution found that query taking significant amount of time in analyzing and Planning phase compared to Cloudera cluster.
I tuned up Apache cluster for heap size configuration and try to maintain same property and it’s values as I have in Cloudera Cluster.
What else I need to double check or need to configure some other services, configurations?
Please suggest.
Same machined hardware configuration and same instances were used in both clusters.
Versions I used in Cloudera
CDH 6.3.2
impalad version 3.2.0
Versions I used in Apache
Hadoop 3.0.0
Impala 3.4.0
Created ‎09-23-2021 05:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@manjj To understand the performance difference between two Cloudera Hadoop cluster and Apache Hadoop cluster. I would suggest you to collect the Impala_query profile from both cluster and compare it.
To analysis the Impala query profile. Please use the below articles.
https://docs.cloudera.com/runtime/7.2.9/impala-reference/topics/impala-profile.html
https://conferences.oreilly.com/strata/strata-ca-2018/cdn.oreillystatic.com/en/assets/1/event/269/Ho...
https://cloudera.ericlin.me/2018/09/impala-query-profile-explained-part-1/
Cheers!
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
