Created 03-29-2016 01:13 PM
We have several queries that fail on MR but succeed on Tez.
When they fail, the logs are full of errors like the ones below. They usually point to specific rows. However, if I reduce the scope of the query, but include the "bad" rows, the queries usually succeed without errors. So it clearly isn't specific to those rows.
I'm guessing there is some kind of overflow happening internally.
I have submitted several instances of this in support tickets, and the feedback is always "please upgrade or just use Tez", but that really isn't a solution, and we just upgraded recently.
I'm looking for guidance on ways that we might tune our Hive or MR settings to work around this.
Thanks.
2016-03-29 08:30:03,751 FATAL [main] org.apache.hadoop.hive.ql.exec.mr.ExecMapper: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {<row data>} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:534) at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:176) at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:397) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.FilterOperator.processOp(FilterOperator.java:120) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815) at org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:95) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:159) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:524) ... 9 more Caused by: java.lang.ArrayIndexOutOfBoundsException at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1450) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer$Buffer.write(MapTask.java:1346) at java.io.DataOutputStream.writeInt(DataOutputStream.java:197) at org.apache.hadoop.io.BytesWritable.write(BytesWritable.java:186) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98) at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1146) at org.apache.hadoop.mapred.MapTask$OldOutputCollector.collect(MapTask.java:607) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.collect(ReduceSinkOperator.java:531) at org.apache.hadoop.hive.ql.exec.ReduceSinkOperator.processOp(ReduceSinkOperator.java:380) ... 15 more
Created 03-29-2016 01:18 PM
Strange, I have usually seen the other pattern: things were failing with Tez but working with MR. And when going to the last version of HDP, the Tez error was fixed.
Could you tell us why you don't want to use Tez? Tez is usually much faster than MR.
Created 03-29-2016 01:30 PM
Thanks @Sourygna Luangsay
We do use Tez for many things. I honestly haven't found it to be "much faster than MR", though it is usually a bit faster.
But I like MR because it integrates very well with the Application Manager GUI. I can find all my logs very easily through the GUI, and even share links with my team when there is a stack track or something in the logging that needs attention.
It also makes it very easy to diagnose when one node on our cluster is a bottleneck. When a query runs slowly, I can watch the mappers and reducers, and can easily see which servers are taking the longest.
I don't know of a good way to do any of those things with Tez. We use the Tez View, but it is buggy. And when it works, it takes many more clicks to find answers.
That's just my experience. Maybe there's a better way to leverage Tez...
Created 03-29-2016 01:51 PM
I guess that Tez being faster than MR generally depends on the kind of queries you have. But this is what I could see in different customer's projects.
Could you tell us which version of HDP you use? I acknowledge that Hive views are not as intuitive as MR Web-UI at the beginning but it does not seem that buggy to me. And you can still send the logs as a URL to people of your team.
As for diagnosing the bottlenecks, I would recommend you to try to use Swimlane with Tez:
https://github.com/apache/tez/tree/master/tez-tools/swimlanes
This is a graphical tool that will help you to understand which container/vertex is the bottleneck in your query.
Created 03-29-2016 03:08 PM
Cool! I'll check it out.
And to answer your question: we are on 2.2.8
Created 10-28-2016 05:50 PM
I was able to fix a similar issue with:
set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
I'm not sure why we have to disable vectorized execution, but it fixed this for us. I hope this helps.
Created 04-09-2018 08:23 PM
Hi , I tried using the below settings before running the insert command but I got the same error again
set hive.vectorized.execution.enabled=false;
set hive.vectorized.execution.reduce.enabled=false;
Created 04-23-2018 08:45 PM
I got it to work using
set hive.auto.convert.join=false;
Created 06-09-2021 06:42 AM
you can try below set parameters
set hive.vectorized.execution.reduce.enabled=false;
and
set hive.vectorized.execution.enabled=true;