Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

After upgrading to HDP 2.5.3: using beeline with a simple query throws Exception with TEZ engine, but works with MR

Highlighted

After upgrading to HDP 2.5.3: using beeline with a simple query throws Exception with TEZ engine, but works with MR

A simple query like SELECT COUNT(*) FROM table WHERE d='2016-12-14' doesn't work with beeline with TEZ engine, but works with MR engine. Also it works with HiveCLI deprecated client with both engines. The error thrown is:

ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
	Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1481626938017_2302_8_00, diagnostics=[Vertex vertex_1481626938017_2302_8_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: impressions initializer failed, vertex=vertex_1481626938017_2302_8_00 [Map 1], java.lang.RuntimeException: serious problem
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1273)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1300)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:307)
	at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:409)
	at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:155)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:273)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:266)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:266)
	at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
	Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 1
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1268)
	... 15 more
	Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSargColumnNames(OrcInputFormat.java:358)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:392)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1011)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2000(OrcInputFormat.java:838)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:992)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:989)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1724)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:989)
	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:838)
	... 4 more
6 REPLIES 6
Highlighted

Re: After upgrading to HDP 2.5.3: using beeline with a simple query throws Exception with TEZ engine, but works with MR

Super Guru

Counts are pretty rough. What other settings do you have?

Looks like you are out of memory.

Tez runs in memory like Spark while Map Reduce uses more disk.

How big is the data?

See:

https://community.hortonworks.com/questions/24730/hive-job-failed-on-tez.html

Highlighted

Re: After upgrading to HDP 2.5.3: using beeline with a simple query throws Exception with TEZ engine, but works with MR

Super Guru

my .02

Switching between MR and Tez is not a simple set execution engine. The parameters for tuning and container sizes required are different between the two engines. I recommend you start by increasing your container size, tune the query (set parameters) for tez accordingly.

Highlighted

Re: After upgrading to HDP 2.5.3: using beeline with a simple query throws Exception with TEZ engine, but works with MR

Do you think it's a matter of memory by the stack trace?

Highlighted

Re: After upgrading to HDP 2.5.3: using beeline with a simple query throws Exception with TEZ engine, but works with MR

Expert Contributor

From the stack this might be a bug in Hive. You should open a JIRA on Apache Hive for this to get a better response.

Caused by: java.lang.ArrayIndexOutOfBoundsException: 1	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSargColumnNames(OrcInputFormat.java:358)	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.setSearchArgument(OrcInputFormat.java:392)

Re: After upgrading to HDP 2.5.3: using beeline with a simple query throws Exception with TEZ engine, but works with MR

Expert Contributor

@Joan Viladrosa Could you post your table definition and some description of what commands you ran

Highlighted

Re: After upgrading to HDP 2.5.3: using beeline with a simple query throws Exception with TEZ engine, but works with MR

Explorer

This looks like it might be related to BI vs ETL query ORC optimizations. I ran into the same issue with HDP 2.5.3 and was able to work around it by setting the ORC split strategy.

From the beeline commandline try "set hive.exec.orc.split.strategy=BI;" Then execute your update SQL statement.

Let us know the result.

Don't have an account?
Coming from Hortonworks? Activate your account here