Member since
06-28-2016
18
Posts
31
Kudos Received
0
Solutions
06-27-2018
10:50 PM
5 Kudos
Problem : Hiveserver2 goes into hung state for various reasons and Hiverserver2 is the one which compiles and executes the queries that we run. Hiveserver2 logs and Jstack of Hiveserver2 process can help identify the root cause in most of the cases.Hive Query processing comprises of 2 major steps after receiving the query from client and connects to metastore is compilation phase (Parsing, Semantic analyzing, Plan generation and optimization) and Execution phase (running mapreduce tasks).
Hive Compiler :
The component that parses the query, does semantic analysis on the different query blocks and query expressions and eventually generates an execution plan with the help of the table and partition metadata looked up from the metastore..
The most common causes of hung state due to compilation are explained below.
Single Threaded Compilation Phase :
Hiveserver2 compilation phase is single threaded by design in Hive1 and when a huge query is submitted to a hive client (JDBC/ODBC) it eventually goes into compilation phase and other hiveserver2 calls have to wait until the query compilation completes and it appears to be hung. But the Execution phase is multithreaded. This is a bottleneck and this has been addressed in Hive2 (LLAP) where where compilation is multithreaded with 1 query per session. We can identify whether HS2 is stuck because of compilation using jstack. Below is the snippet of jtsack of HS2 process when it is unresponsive due to a query stuck in single threaded compilation phase and blocks other threads.
Thread in compilation phase:
"HiveServer2-Handler-Pool: Thread-75" #75 prio=5 os_prio=0 tid=0x00007f6d94624800 nid=0x39c18b runnable [0x00007f6d1a560000]
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA32.specialStateTransition(HiveParser_IdentifiersParser.java)
at org.antlr.runtime.DFA.predict(DFA.java:80)
Other Thread in blocked state :
HiveServer2-Handler-Pool: Thread-698" #698 prio=5 os_prio=0 tid=0x00007f6d9451a800 nid=0x3c5e4e waiting for monitor entry [0x00007f6d17cf8000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1189)
- waiting to lock <0x00007f6da61b4ab8> (a java.lang.Object)
Corresponding query can be identified using the thread number in the Hiveserver2 Logs.
Mitigation :
Splitting the huge query into multiple small queries.
Configuring multiple hiveserver2 to share the load.
Restart Hiveserver2
Parsing :
Sometimes if a query has too many '(' in AND/ OR condition then Hiveserver2 will take long time to Parse it because of a product bug HIVE-15388 which is fixed in HDP 2.6.X versions. This can also be identified for Jstack of HS2 process. The permanent solution would be to upgrade to latest version. "HiveServer2-Handler-Pool: Thread-483" #483 prio=5 os_prio=0 tid=0x00007fc6153ac800 nid=0x752a runnable [0x00007fc5a0e09000]
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser$DFA43.specialStateTransition(HiveParser_IdentifiersParser.java)
at org.antlr.runtime.DFA.predict(DFA.java:80)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceEqualExpression(HiveParser_IdentifiersParser.java:8115)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceNotExpression(HiveParser_IdentifiersParser.java:9886)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceAndExpression(HiveParser_IdentifiersParser.java:10005)
at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.precedenceOrExpression(HiveParser_IdentifiersParser.java:10195)
Ranger Authorization :
Sometimes create table statements for external tables take long time to complete and eventually cause Hiveserver2 unresponsive when Ranger Hive plugin enabled. When the file path specified in Hive statements like 'create external table' does not exists, Ranger Hive authorizer checks for permissions in all the subdirectories and it files. For example if you have 20,000 files in the S3 location the external table pointing to and then Ranger has to do the file permission check for all the 20k files including files under subdirectories that is 20k iterations. This is the reason being hive unresponsive to other calls. This can be identified from jstack of HS2 process. This is also addressed in HDP 2.6.X versions (RANGER-1126 & HIVE-10022). This can also be mitigated by executing the statements from Hive CLI to bypass HS2 and Ranger Auth. org.apache.hadoop.hive.common.FileUtils.checkFileAccessWithImpersonation(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, org.apache.hadoop.fs.permission.FsAction, java.lang.String) @bci=31, line=381 (Compiled frame) - org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, java.lang.String, org.apache.hadoop.fs.permission.FsAction, boolean) @bci=27, line=429 (Compiled frame) - org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, java.lang.String, org.apache.hadoop.fs.permission.FsAction, boolean) @bci=91, line=443 (Compiled frame) - org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, java.lang.String, org.apache.hadoop.fs.permission.FsAction, boolean) @bci=91, line=443 (Compiled frame) - org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, java.lang.String, org.apache.hadoop.fs.permission.FsAction, boolean) @bci=91, line=443 (Compiled frame) - org.apache.hadoop.hive.common.FileUtils.isActionPermittedForFileHierarchy(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.FileStatus, java.lang.String, org.apache.hadoop.fs.permission.FsAction) @bci=5, line=415 (Compiled frame) org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizer.isURIAccessAllowed(java.lang.String, org.apache.hadoop.fs.permission.FsAction, java.lang.String, org.apache.hadoop.hive.conf.HiveConf) @bci=104, line=1026 (Compiled frame)
Tree Traversal : Submitting complex queries may cause tree traversal issue sometimes which inturn hangs compiler thread and block Hive from accepting other requests. Turning off hive.optimize.ppd at session level can address compilation issue but this can penalize the performance. The example snippet of Jstack for this issue. HiveServer2-Handler-Pool: Thread-86129" #86129 prio=5 os_prio=0 tid=0x00007f3ad9e1a800 nid=0x1003b runnable [0x00007f3a73b0a000]java.lang.Thread.State: RUNNABLEat java.util.HashMap$TreeNode.find(HashMap.java:1865)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.find(HashMap.java:1861)at java.util.HashMap$TreeNode.getTreeNode(HashMap.java:1873)at java.util.HashMap.getNode(HashMap.java:575)at java.util.HashMap.get(HashMap.java:556)
... View more
Labels:
06-30-2017
09:00 PM
4 Kudos
ISSUE : Hive Metastore stops responding until a restart happens. WORKAROUND : Please collect the jstack output 5 times at an interval of 30 seconds when hivemetastore is hung. jstack -F <hivemetastorepid> > jstack.out Then analyze the jstack output and look for the keyword "locked". If you see something similar to below then there are locked threads with respect to filesystem cache. "pool-5-thread-180" #12021 prio=5 os_prio=0 tid=0x00007f63de37e000 nid=0x3900 runnable [0x00007f6391fc3000]
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.fs.FileSystem$Cache.closeAll(FileSystem.java:2886)
- locked <0x00000005c0032ca8> (a org.apache.hadoop.fs.FileSystem$Cache)
at org.apache.hadoop.fs.FileSystem.closeAllForUGI(FileSystem.java:473)
at org.apache.hadoop.hive.metastore.TUGIBasedProcessor.process(TUGIBasedProcessor.java:131)
Please add the following to hive-site.xml and restart Hive services. These properties will disable the hdfs filesystem cache. fs.hdfs.impl.disable.cache=true
fs.file.impl.disable.cache=true
... View more
Labels:
06-29-2017
07:39 PM
4 Kudos
ISSUE : Simple select query against an ORC table without limit clause is failing with the below exception. 2017-06-26 16:00:35,228 ERROR [main]: CliDriver (SessionState.java:printError(993)) - Failed with exception java.io.IOException:java.lang.RuntimeException: serious problem java.io.IOException: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:520) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:427) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1765) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:237) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Caused by: java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1258) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1285) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:371) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:303) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:458) ... 15 more Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1253) ... 19 more Caused by: java.lang.OutOfMemoryError: Java heap space We can see the query is failing when it is trying to generate ORC splits. WORKAROUND: hive.exec.orc.split.strategy=BI What strategy ORC should use to create splits for execution. The available options are "BI", "ETL" and "HYBRID".
Default setting is HYBRID
The HYBRID mode reads the footers for all files if there are fewer files than expected mapper count, switching over to generating 1 split per file if the average file sizes are smaller than the default HDFS blocksize. ETL strategy always reads the ORC footers before generating splits, while the BI strategy generates per-file splits fast without reading any data from HDFS.
... View more
Labels:
06-29-2017
07:04 PM
2 Kudos
ISSUE : Did OS Patch on all their hosts on the cluster and after that Datanodes are started but none of them are alive in the HDFS Dashboard and all the blocks are showing missing Linux cl-master1 2.6.32-696.3.2.el6.x86_64 #1 SMP Wed Jun 7 11:51:39 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux RESOLUTION :
This is a Kernel Bug https://issues.apache.org/jira/browse/HDFS-12029 Please put the Namenode in Safe mode using the below command Hdfs dfsadmin –safemode enter Then add “-Xss2m” on in between the below lines /usr/hdp/2.6.0.3-8/hadoop-hdfs/bin/hdfs.distro on all the Datanodes exec "$JSVC" \
-Xss2m \
-Dproc_$COMMAND -outfile "$JSVC_OUTFILE" \
-errfile "$JSVC_ERRFILE" \
-pidfile "$HADOOP_SECURE_DN_PID" \
-nodetach \
-user "$HADOOP_SECURE_DN_USER" \
-cp "$CLASSPATH" \
$JAVA_HEAP_MAX $HADOOP_OPTS \
org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter "$@"
Please restart the data nodes from Ambari and leave the safe mode on Namenode. hdfs dfsadmin -leave safemode
... View more
Labels:
03-27-2017
08:00 PM
3 Kudos
PROBLEM :
When the property skip.header.line.count is enabled on tables with large text files (~100G), we can see that the tez.grouping* parameters are ineffective, spinning one container per file. EXAMPLE:
We have a table with 10 files,
117.0 G A.txt
29.0 G B.txt
30.0 G C.txt
1.9 G D.txt
11.0 G E.txt
18.9 G F.txt
5.7 G G.txt
159.4 H.txt
2.4 G I.txt
214.4 M J.txt Created table on top of this data, using
CREATE TABLE `tab1`(
`opr_glbl_sku_id` string,
`opr_loc_nbr` string,
`wk_start_dt` string,
`base_fcst` float)
...
TBLPROPERTIES (
'skip.header.line.count'='1')
Set these params (below), after which expectation is at least 200G/0.5G= 400 containers should be launched.
SET tez.grouping.min-size=53421772;
SET tez.grouping.max-size=534217728; With tez as the execution engine an attempt to perform select count(*) on the table spins only 10 containers.
Once we remove the skip.header.line.count', it launched about 540 containers, which is expected behavior. ROOT CAUSE :
hive.tez.input format was set to default org.apache.hadoop.hive.ql.io.HiveInputFormat; HiveInputFormat is a parameterized InputFormat which looks at the path name and determine the correct InputFormat for that path name from mapredPlan.pathToPartitionInfo(). It can be used to read files with different input format in the same map-reduce job. RESOLUTION :
CombineInputFormat combine all small files to generate a split.
hive.tez.input
format=org.apache.hadoop.hive.ql.io.CombineHiveInputFormat;
... View more
Labels:
03-27-2017
07:57 PM
1. Determine the version of the source hive database (from old cluster): mysql> select * from VERSION;
+--------+----------------+----------------------------+
| VER_ID | SCHEMA_VERSION | VERSION_COMMENT |
+--------+----------------+----------------------------+
| 1 | 1.2.0 | Hive release version 1.2.0 |
+--------+----------------+----------------------------+ 2. On the old clusters metastore Mysql Database, take a dump of the hive database
mysqldump -u root hive > /data/hive_20161122.sql 3. Do a find and replace in the dump file for any host name from the old cluster and change them to the new cluster (i.e. namenode address). Be cautious if the target cluster is HA enabled, when replacing hostname/hdfs url. 4. On the new cluster, stop the Hive Metastore service 5. Within MySQL, perform drop/create of the new hive database.
mysql> drop database hive; create database hive; 7. Run the schematool to initialize the schema to the exact version of the hive schema on the old cluster.
/usr/hdp/current/hive-metastore/bin/schematool -dbType mysql -initSchemaTo 1.2.0 -userName hive -passWord '******' -verbose 8. Import the hive database from the old cluster:
mysql -u root hive < /tmp/hive_20161122.sql 9. Upgrade the schema to the latest schema version:
/usr/hdp/current/hive-metastore/bin/schematool -upgradeSchema -dbType mysql -userName hive -passWord '******' -verbose 10. Start the Hive Metastore.
... View more
03-27-2017
07:54 PM
3 Kudos
PROBLEM : Query fails with the below mentioned exception. ERROR : java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: java.io.IOException: java.lang.NullPointerException
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(TezGroupedSplitsInputFormat.java:192)
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(TezGroupedSplitsInputFormat.java:131) ROOT CAUSE :
Query fails on a field of datatype timestamp within an ORC table having null values, provided Predicate push down is enabled.
This is a known issue. Reference: https://hortonworks.jira.com/browse/BUG-47472 WORKAROUND / RESOLUTION :
This setting is whether to enable automatic use of indexes or not. There could come with some performance penalties. hive.optimize.index.filter=false;
... View more
Labels:
03-23-2017
08:06 PM
PROBLEM : Cloudbreak failed to start.
SYMPTOM : We get the below error "bind already in use" ERROR :
ERROR: for consul Cannot start container 8a0b07e6b35e7350656b54314f7023107f417c0e0031db28e91f6bfb7a245d6f: failed to create endpoint cbreak_consul_1 on network bridge: Error starting userland proxy: listen udp X.X.X.X: bind: address already in use
Traceback (most recent call last):
File "<string>", line 3, in <module>
File "compose/cli/main.py", line 63, in main
AttributeError: 'ProjectError' object has no attribute 'msg'
docker-compose returned -1
ROOT CAUSE : Cloudbreak may have stale process and because there are some container still running from the previous run.
RESOLUTION : Please run the following :
cbd kill
# check if all containers have stopped
cbd start
This should fix the issue.
... View more
Labels:
03-23-2017
07:54 PM
3 Kudos
screen-shot-2017-03-09-at-44439-pm.pngscreen-shot-2017-03-09-at-44200-pm.png Please use the REST-plugin tool to perform the deletion. Please find the attached screenshot on how to use it, and the instruction on how to use encoded password (string to the word Basic)
REST-plugin :
https://chrome.google.com/webstore/detail/advanced-rest-client/hgmloofddffdnphfgcellkdfbfbjeloo/related?hl=en-US You can encode your password using this tool : http://base64-encoding.online-domain-tools.com/ Type in the format username:password to get the encoded password.
... View more
Labels:
03-23-2017
07:41 PM
2 Kudos
Why-Hive-rolling-restart-doesn-t-change-the-configurations-all-the-time? This completely depends on what configuration are you trying to change. For example :
We changed the value of hive.exec.compress.output to true and restarted 1 hiveserver2 and when we checked beeline set hive.exec.compress.output the value got changed. 2. Then we changed the log4j property hive.root.logger value to Debug and then just restarted hiveserver2 but the change didn't take effect until we restart Hive metastore on the host 3. Then we changed hadoop.security.credential.provider.path this value but this didn't take effect until we refresh the client configs first and then restart metastore and Hiveserver2. (restart order is important)
... View more
Labels: