Created 12-14-2016 04:43 PM
A simple query like SELECT COUNT(*) FROM table WHERE d='2016-12-14' doesn't work with beeline with TEZ engine, but works with MR engine. Also it works with HiveCLI deprecated client with both engines. The error thrown is:
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1481626938017_2302_8_00, diagnostics=[Vertex vertex_1481626938017_2302_8_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: impressions initializer failed, vertex=vertex_1481626938017_2302_8_00 [Map 1], java.lang.RuntimeException: serious problem at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1273) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1300) at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307) at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409) at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266) at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 1 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1268) ... 15 more Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSargColumnNames(OrcInputFormat.java:358) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:392) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1011) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2000(OrcInputFormat.java:838) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:992) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:989) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:989) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:838) ... 4 more
Created 12-14-2016 06:38 PM
Counts are pretty rough. What other settings do you have?
Looks like you are out of memory.
Tez runs in memory like Spark while Map Reduce uses more disk.
How big is the data?
See:
https://community.hortonworks.com/questions/24730/hive-job-failed-on-tez.html
Created 12-14-2016 10:16 PM
my .02
Switching between MR and Tez is not a simple set execution engine. The parameters for tuning and container sizes required are different between the two engines. I recommend you start by increasing your container size, tune the query (set parameters) for tez accordingly.
Created 12-15-2016 09:16 AM
Do you think it's a matter of memory by the stack trace?
Created 12-15-2016 07:58 PM
From the stack this might be a bug in Hive. You should open a JIRA on Apache Hive for this to get a better response.
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1 at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSargColumnNames(OrcInputFormat.java:358) at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:392)
Created 12-21-2016 08:00 PM
@Joan Viladrosa Could you post your table definition and some description of what commands you ran
Created 12-23-2016 02:49 PM
This looks like it might be related to BI vs ETL query ORC optimizations. I ran into the same issue with HDP 2.5.3 and was able to work around it by setting the ORC split strategy.
From the beeline commandline try "set hive.exec.orc.split.strategy=BI;" Then execute your update SQL statement.
Let us know the result.