Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Super Guru

SYMPTOM:

HiveServer2 is hung and is not able to execute simple query like show tables, during the investigation, we took some jstacks and realize that the following thread is processing very slow.

"HiveServer2-Handler-Pool: Thread-86129" #86129 prio=5 os_prio=0 tid=0x00007f3ad9e1a800 nid=0x1003b runnable [0x00007f3a73b0a000]
   java.lang.Thread.State: RUNNABLE
	at java.util.HashMap$TreeNode.find(HashMap.java:1851)
	at java.util.HashMap$TreeNode.find(HashMap.java:1861)
	at java.util.HashMap$TreeNode.find(HashMap.java:1861)
	at java.util.HashMap$TreeNode.find(HashMap.java:1861)
	at java.util.HashMap$TreeNode.find(HashMap.java:1861)
	at java.util.HashMap$TreeNode.find(HashMap.java:1861)
	at java.util.HashMap$TreeNode.find(HashMap.java:1861)
	at java.util.HashMap$TreeNode.find(HashMap.java:1861)
	at java.util.HashMap$TreeNode.putTreeVal(HashMap.java:1979)
	at java.util.HashMap.putVal(HashMap.java:637)
	at java.util.HashMap.put(HashMap.java:611)
	at org.apache.hadoop.hive.ql.ppd.ExprWalkerProcFactory.extractPushdownPreds(ExprWalkerProcFactory.java:290)
	at org.apache.hadoop.hive.ql.ppd.OpProcFactory$DefaultPPD.mergeWithChildrenPred(OpProcFactory.java:746)
	at org.apache.hadoop.hive.ql.ppd.OpProcFactory$JoinerPPD.process(OpProcFactory.java:464)
	at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:90)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatchAndReturn(DefaultGraphWalker.java:95)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:79)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:133)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:110)
	at org.apache.hadoop.hive.ql.ppd.PredicatePushDown.transform(PredicatePushDown.java:135)
	at org.apache.hadoop.hive.ql.optimizer.Optimizer.optimize(Optimizer.java:192)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:10167)
	at org.apache.hadoop.hive.ql.parse.CalcitePlanner.analyzeInternal(CalcitePlanner.java:211)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:227)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:406)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:290)
	at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1110)
	- locked <0x00000005c1e1bc18> (a java.lang.Object)
	at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:1104)
	at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:110)
	at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:181)
	at org.apache.hive.service.cli.operation.Operation.run(Operation.java:257)
	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:388)
	at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:375)
	at sun.reflect.GeneratedMethodAccessor65.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
	at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
	at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
	at com.sun.proxy.$Proxy45.executeStatementAsync(Unknown Source)
	at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:274)
	at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:486)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

ROOT CAUSE:

the compilation stage of a query in hiveserver2 is single threaded before Hive-2, so only one query is able to compile during that stage and other queries will remain in the wait state. we observe that during compilation this thread is executing expression factory for predicate pushdown processing, in which Each processor determines whether the expression is a possible candidate for predicate pushdown optimization for the given operator. the user was running a huge query at that time with which consist of more than 300+ 'case and then condition' which is taking too long.

WORKAROUND:

ask customer to set hive.optimize.ppd=false at session level while running this query and ask them to rewrite the sql in more optimized way.

RESOLUTION:

set hive.optimize.ppd=false at session level

1,098 Views