Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive performance bad on higher configured cluster, why?

Highlighted

Hive performance bad on higher configured cluster, why?

Rising Star

We setup two HDP2.5 clusters, clusterA for development while clusterB for production

clusterAclusterB
Datanodes11 40 Type1:(21 datanodes without nodemanager) Type2:(19 datanodes with nodemanager together)
Yarn nodemanagers1119
Host memory384GType1: 64G Type2: 1024G
CPU cores32Type1:24 Type2:120
Namenode HANot enabledenabled
TesTDFSIO result

----- TestDFSIO ----- : write Date & time: Mon Jan 09 11:10:37 CST 2017 Number of files: 5 Total MBytes processed: 500.0 Throughput mb/sec: 18.242183224488308 Average IO rate mb/sec: 21.3885440826416 IO rate std deviation: 8.865912292307934 Test exec time sec: 42.365

----- TestDFSIO ----- : read Date & time: Mon Jan 09 11:12:06 CST 2017 Number of files: 5 Total MBytes processed: 500.0 Throughput mb/sec: 57.32630130703967 Average IO rate mb/sec: 58.91874313354492 IO rate std deviation: 10.594356939880925 Test exec time sec: 25.383

----- TestDFSIO ----- : write Date & time: Mon Jan 09 11:33:05 CST 2017 Number of files: 5 Total MBytes processed: 500.0 Throughput mb/sec: 95.21995810321843 Average IO rate mb/sec: 97.78007507324219 IO rate std deviation: 15.6598450438094 Test exec time sec: 45.938

----- TestDFSIO ----- : read Date & time: Mon Jan 09 11:34:11 CST 2017 Number of files: 5 Total MBytes processed: 500.0 Throughput mb/sec: 424.08821034775235 Average IO rate mb/sec: 433.62176513671875 IO rate std deviation: 66.83194258523632 Test exec time sec: 49.354

On clusterA,HQL

hive (default)> select count(1) from humep.hw_cpb_relation;
Query ID = root_20170109144956_148b0601-d8ed-4163-9b66-37005b09fbde
Total jobs = 1
Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id application_1481424485054_16307)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED    158        158        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 12.75 s    
--------------------------------------------------------------------------------
OK
701585564
Time taken: 13.137 seconds, Fetched: 1 row(s)
hive (default)> 

on clusterB

hive> select count(1) from humep.hw_cpb_relation;
Query ID = root_20170109144935_8a64bd30-a292-452f-ad46-d850d899a9b0
Total jobs = 1
Launching Job 1 out of 1

Status: Running (Executing on YARN cluster with App id application_1483672680049_40368)
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED    148        148        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 28.92 s    
--------------------------------------------------------------------------------
OK
771149258
Time taken: 31.576 seconds, Fetched: 1 row(s)
hive> 

The performance on clusterB is bad than clusterA when the hardware are better more than clusterA.

Would any expert advise how to check and tune such issue? Thanks

8 REPLIES 8
Highlighted

Re: Hive performance bad on higher configured cluster, why?

Rising Star

I also noticed following NameNode RPC(last 1 month) in Ambari

clusterAclusterB
NameNode RPCClient RPC processing timemin 0.078ms avg 0.251ms max 2.07msmin 0.451ms avg 130.541ms max 23304.563ms
Client RPC Queue Wait timemin 0.05ms avg 0.684ms max 11.654msmin 0.084ms avg 0.166ms max 1.157ms

did this will impact hive performance?

Highlighted

Re: Hive performance bad on higher configured cluster, why?

Rising Star

chaos format, how to delete it?

Highlighted

Re: Hive performance bad on higher configured cluster, why?

Rising Star

I also noticed following NameNode RPC(last 1 month) in Ambari,

did this will impact hive performance?

clusterAclusterB
NameNode RPCClient RPC processing timemin 0.078ms avg 0.251ms max 2.07msmin 0.451ms avg 130.541ms max 23304.563ms

NameNode RPC

Client RPC Queue Wait timemin 0.05ms avg 0.684ms max 11.654msmin 0.084ms avg 0.166msmax 1.157ms
Highlighted

Re: Hive performance bad on higher configured cluster, why?

@Huahua Wei

You could compare the configuration of both the clusters at a more detailed level and then observe the differences closely.

You will end up nailing down the configuration property which is causing the degraded performance in one of the clusters.

Potential areas to look for : Namenode HA enabled/disabled, memory settings, hive configurations, core site configs.

Highlighted

Re: Hive performance bad on higher configured cluster, why?

Rising Star

Does Namenode HA enablement impact performance lot ?

Highlighted

Re: Hive performance bad on higher configured cluster, why?

Yes, HA enabled will definitely have some impact on performance.

If my answer has helped you, kindly consider accepting it.

Thank you.

Re: Hive performance bad on higher configured cluster, why?

Expert Contributor
@Huahua Wei

What about YARN and Hive Settings? Like:

Yarn.nodemanager.resources.memory-mb

min/max container memory

min/max container sizes

Tez container size

All of these settings can be found in ambari in YARN/TEZ/MapReduce. I would ensure that they are set to take full advantage of all cluster resources.

Highlighted

Re: Hive performance bad on higher configured cluster, why?

Rising Star

@mliem

following are the settings for the two clusters

parametersclusterAclusterB
Yarn.nodemanager.resources.memory-mb327680 MB923648 MB
yarn.scheduler.minimum-allocation-mb8192 MB8192 MB
yarn.scheduler.maximum-allocation-mb262144 MB524288 MB
yarn.scheduler.minimum-allocation-vcores1 2
yarn.scheduler.maximum-allocation-vcores25 96
tez.am.resource.memory.mb8192 MB8192 MB
hive.tez.container.size8192 MB26624 MB

Don't have an account?
Coming from Hortonworks? Activate your account here