Created on 06-27-2018 10:50 PM
Problem : Hiveserver2 goes into hung state for various reasons and Hiverserver2 is the one which compiles and executes the queries that we run. Hiveserver2 logs and Jstack of Hiveserver2 process can help identify the root cause in most of the cases.Hive Query processing comprises of 2 major steps after receiving the query from client and connects to metastore is compilation phase (Parsing, Semantic analyzing, Plan generation and optimization) and Execution phase (running mapreduce tasks).
Hive Compiler :
The component that parses the query, does semantic analysis on the different query blocks and query expressions and eventually generates an execution plan with the help of the table and partition metadata looked up from the metastore..
The most common causes of hung state due to compilation are explained below.
Single Threaded Compilation Phase :
Hiveserver2 compilation phase is single threaded by design in Hive1 and when a huge query is submitted to a hive client (JDBC/ODBC) it eventually goes into compilation phase and other hiveserver2 calls have to wait until the query compilation completes and it appears to be hung. But the Execution phase is multithreaded. This is a bottleneck and this has been addressed in Hive2 (LLAP) where where compilation is multithreaded with 1 query per session. We can identify whether HS2 is stuck because of compilation using jstack. Below is the snippet of jtsack of HS2 process when it is unresponsive due to a query stuck in single threaded compilation phase and blocks other threads.
Thread in compilation phase: "HiveServer2-Handler-Pool: Thread-75" #75 prio=5 os_prio=0 tid=0x00007f6d94624800 nid=0x39c18b runnable [0x00007f6d1a560000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA32.specialStateTransition(HiveParser_IdentifiersParser.java) at org.antlr.runtime.DFA.predict(DFA.java:80) Other Thread in blocked state : HiveServer2-Handler-Pool: Thread-698" #698 prio=5 os_prio=0 tid=0x00007f6d9451a800 nid=0x3c5e4e waiting for monitor entry [0x00007f6d17cf8000] java.lang.Thread.State: BLOCKED (on object monitor) at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189) - waiting to lock <0x00007f6da61b4ab8> (a java.lang.Object)
Corresponding query can be identified using the thread number in the Hiveserver2 Logs.
Mitigation :
Splitting the huge query into multiple small queries.
Configuring multiple hiveserver2 to share the load.
Restart Hiveserver2
Parsing :
Sometimes if a query has too many '(' in AND/ OR condition then Hiveserver2 will take long time to Parse it because of a product bug HIVE-15388 which is fixed in HDP 2.6.X versions. This can also be identified for Jstack of HS2 process. The permanent solution would be to upgrade to latest version.
"HiveServer2-Handler-Pool: Thread-483" #483 prio=5 os_prio=0 tid=0x00007fc6153ac800 nid=0x752a runnable [0x00007fc5a0e09000] java.lang.Thread.State: RUNNABLE at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA43.specialStateTransition(HiveParser_IdentifiersParser.java) at org.antlr.runtime.DFA.predict(DFA.java:80) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8115) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9886) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceAndExpression(HiveParser_IdentifiersParser.java:10005) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceOrExpression(HiveParser_IdentifiersParser.java:10195)
Ranger Authorization :
Sometimes create table statements for external tables take long time to complete and eventually cause Hiveserver2 unresponsive when Ranger Hive plugin enabled. When the file path specified in Hive statements like 'create external table' does not exists, Ranger Hive authorizer checks for permissions in all the subdirectories and it files. For example if you have 20,000 files in the S3 location the external table pointing to and then Ranger has to do the file permission check for all the 20k files including files under subdirectories that is 20k iterations. This is the reason being hive unresponsive to other calls. This can be identified from jstack of HS2 process. This is also addressed in HDP 2.6.X versions (RANGER-1126 & HIVE-10022). This can also be mitigated by executing the statements from Hive CLI to bypass HS2 and Ranger Auth.
org.apache.hadoop.hive.common.FileUtils.checkFileAccessWithImpersonation(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, org.apache.hadoop.fs.permission.FsAction, java.lang.String) @bci=31, line=381 (Compiled frame) - org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, java.lang.String, org.apache.hadoop.fs.permission.FsAction, boolean) @bci=27, line=429 (Compiled frame) - org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, java.lang.String, org.apache.hadoop.fs.permission.FsAction, boolean) @bci=91, line=443 (Compiled frame) - org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, java.lang.String, org.apache.hadoop.fs.permission.FsAction, boolean) @bci=91, line=443 (Compiled frame) - org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, java.lang.String, org.apache.hadoop.fs.permission.FsAction, boolean) @bci=91, line=443 (Compiled frame) - org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, java.lang.String, org.apache.hadoop.fs.permission.FsAction) @bci=5, line=415 (Compiled frame) org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.isURIAccessAllowed(java.lang.String, org.apache.hadoop.fs.permission.FsAction, java.lang.String, org.apache.hadoop.hive.conf.HiveConf) @bci=104, line=1026 (Compiled frame)
Tree Traversal : Submitting complex queries may cause tree traversal issue sometimes which inturn hangs compiler thread and block Hive from accepting other requests. Turning off hive.optimize.ppd at session level can address compilation issue but this can penalize the performance. The example snippet of Jstack for this issue.
HiveServer2-Handler-Pool: Thread-86129" #86129 prio=5 os_prio=0 tid=0x00007f3ad9e1a800 nid=0x1003b runnable [0x00007f3a73b0a000]java.lang.Thread.State: RUNNABLEat java.util.HashMap$TreeNode.find(HashMap.java:1865)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.getTreeNode(HashMap.java:1873)at java.util.HashMap.getNode(HashMap.java:575)at java.util.HashMap.get(HashMap.java:556)