Support Questions

suhel_khan · ‎07-19-2017

I am facing hive errors intermittently, Garbage Collection Issues indicated in the log: hiveserver2:

@dh01 hive]$ cat hiveserver2.log | grep 'GC'
        at org.apache.tez.dag.api.client.DAGClientHandler.submitDAG(DAGClientHandler.java:118)
        at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolBlockingPBServerImpl.submitDAG(DAGClientAMProtocolBlockingPBServerImpl.java:163)
        at org.apache.tez.dag.api.client.rpc.DAGClientAMProtocolRPC$DAGClientAMProtocol$2.callBlockingMethod(DAGClientAMProtocolRPC.java:7471)
2017-07-17 14:00:22,815 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@59fc6d05]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1913ms
GC pool 'PS Scavenge' had collection(s): count=1 time=1961ms
2017-07-17 14:14:28,531 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@59fc6d05]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1452ms
GC pool 'PS Scavenge' had collection(s): count=1 time=1701ms
2017-07-17 15:04:32,309 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@59fc6d05]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1838ms
GC pool 'PS Scavenge' had collection(s): count=1 time=2195ms
2017-07-17 16:08:45,121 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@59fc6d05]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 1568ms
GC pool 'PS Scavenge' had collection(s): count=1 time=1707ms

hivemetastore:

@dh01 hive]$ cat hivemetastore.log | grep -i "GC pool"
GC pool 'PS Scavenge' had collection(s): count=1 time=3521ms
GC pool 'PS MarkSweep' had collection(s): count=1 time=11097ms
GC pool 'PS Scavenge' had collection(s): count=1 time=37ms

@dh01 hive]$ cat hivemetastore.log | grep -i "JvmPauseMonitor"
2017-07-19 04:26:50,008 INFO  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@4f85aca0]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(195)) - Detected pause in JVM or host machine (eg GC): pause of approximately 3050ms
2017-07-19 11:01:32,392 WARN  [org.apache.hadoop.util.JvmPauseMonitor$Monitor@4f85aca0]: util.JvmPauseMonitor (JvmPauseMonitor.java:run(191)) - Detected pause in JVM or host machine (eg GC): pause of approximately 10915ms

HiveServer2 Heap Size = 24210 MB (had been set already) Metastore Heap Size = 12288 MB (changed from 8 GB previously). Client heap Size= 2 GB (changed from 1 GB previously).

I did read the article below and the provided links, which was helpfull:

https://community.hortonworks.com/articles/14170/namenode-garbage-collection-configuration-best-pra....

but after having made the changes to indicated heap sizes , i still had instances were Hiveserver2 or Metastore service would go on alert in ambari for a few seconds and come back healthy.

The logs , did not have any errors in this instance

hive.out

hive.log

hive-server2.out

hive-server2.log

hivemetastore.log

hiveserver2.log

Am i missing something ?, would setting HiveServer2 Heap Size and Metastore Heap Size Same help.. i.e setting (HiveServer2 Heap Size =12288 MB)

Environment:

Hadoop 2.7.1.2.4.0.0-169 
hive-meta-store - 2.4.0.0-169 
hive-server2 - 2.4.0.0-169
hive-webhcat - 2.4.0.0-169 
Ambari 2.2.1.0

mqureshi · ‎07-19-2017

@Suhel

How many users are connecting to your HiveServer 2 concurrently? That determines your memory. From Hortonworks recommendations, for 20 concurrent users you need a mere 6 GB. If you have 10 concurrent connections, 4 GB is enough. For single connection 2 GB, so definitely you don't wont to go below that.

When you have too much memory, you run into what's called "Stop the world garbage collection pauses". You can google more on this but basically JVM needs to move object and update references to it. Now if you move object before updating the references and application that is running access it from old reference than there is trouble. if you update reference first and than try to move object the updated reference is wrong till object is moved and any access while object has not moved will cause issue.

For both CMS and Parallel collector the young generation collection algorithm is similar and it is stop the world that is, application is stopped when collection is happening.

When you allocate too much memory, like 24 GB, stop the world takes longer time, hence your application fails.

So, your metastore does not need to have same memory as Hive Server 2. They are two different processes. If metastore is also running into similar issues, you can set it to 8 GB or less - that's still a lot of memory for just Metastore.

View solution in original post

mqureshi · ‎07-19-2017

@Suhel

How many users are connecting to your HiveServer 2 concurrently? That determines your memory. From Hortonworks recommendations, for 20 concurrent users you need a mere 6 GB. If you have 10 concurrent connections, 4 GB is enough. For single connection 2 GB, so definitely you don't wont to go below that.

When you have too much memory, you run into what's called "Stop the world garbage collection pauses". You can google more on this but basically JVM needs to move object and update references to it. Now if you move object before updating the references and application that is running access it from old reference than there is trouble. if you update reference first and than try to move object the updated reference is wrong till object is moved and any access while object has not moved will cause issue.

For both CMS and Parallel collector the young generation collection algorithm is similar and it is stop the world that is, application is stopped when collection is happening.

When you allocate too much memory, like 24 GB, stop the world takes longer time, hence your application fails.

So, your metastore does not need to have same memory as Hive Server 2. They are two different processes. If metastore is also running into similar issues, you can set it to 8 GB or less - that's still a lot of memory for just Metastore.

suhel_khan · ‎07-21-2017

@mqureshi .. Thanks for getting back. I have reduced the HiveServer2 Heap Size to 20 GB and observing the behavior, i intend to reduce to 12 GB ,step wise over the coming days.

martin_ciruzzi · ‎12-13-2017

@mqureshi Do you think that switching to G1GC may help in this scenarios?

,

@mqureshi Does switching to G1GC help in this scenarios?

Cloudera Community

Support Questions

Should the HiveServer2 Heap Size and Metastore Heap Size be Same ?