Created 05-03-2019 01:50 PM
This query outputs NPE. The tasks with NPEs are retried, and most of the times (but not always) end up succeeding.
I could not find a smaller query showing my problem so I give here my full query:
select s.ts_utc as sent_dowhour , o.ts_utc as open_dowhour , sum(count(s.ts_utc)) over(partition by s.ts_utc) as sent_count from vault.sent s left join open o on o.id=s.id group by 1, 2
My guess is that the construction
sum(count(...)) over(partition by ...)
has issues.
When it fails, this is the output I get:
Vertex failed, vertexName=Reducer 2, vertexId=vertex_1556016846110_42971_7_03, diagnostics= » Task failed, taskId=task_1556016846110_42971_7_03_000221, diagnostics= » TaskAttempt 0 failed, info= » Error: Error while running task ( failure ) : attempt_1556016846110_42971_7_03_000221_0:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:296) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:250) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:374) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:108) at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:41) at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:77) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:304) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.run(ReduceRecordProcessor.java:318) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:267) ... 16 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:378) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord(ReduceRecordSource.java:294) ... 18 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:795) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$GroupIterator.next(ReduceRecordSource.java:363) ... 19 more Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.ql.exec.persistence.PTFRowContainer.first(PTFRowContainer.java:115) at org.apache.hadoop.hive.ql.exec.PTFPartition.iterator(PTFPartition.java:114) at org.apache.hadoop.hive.ql.udf.ptf.BasePartitionEvaluator.getPartitionAgg(BasePartitionEvaluator.java:200) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.evaluateFunctionOnPartition(WindowingTableFunction.java:155) at org.apache.hadoop.hive.ql.udf.ptf.WindowingTableFunction.iterator(WindowingTableFunction.java:538) at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.finishPartition(PTFOperator.java:349) at org.apache.hadoop.hive.ql.exec.PTFOperator.process(PTFOperator.java:123) at org.apache.hadoop.hive.ql.exec.Operator.baseForward(Operator.java:994) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:940) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:927) at org.apache.hadoop.hive.ql.exec.GroupByOperator.forward(GroupByOperator.java:1050) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processAggr(GroupByOperator.java:850) at org.apache.hadoop.hive.ql.exec.GroupByOperator.processKey(GroupByOperator.java:724) at org.apache.hadoop.hive.ql.exec.GroupByOperator.process(GroupByOperator.java:790) ... 20 more
Semantically my query is valid (and indeed sometimes succeeds) so what is going on?
Note:
Created 08-12-2019 03:22 AM
Might be related to a container-reuse issue: HIVE-18786 -- perhaps disable tez.am.container.reuse.enabled in tez-site.xml to verify?
Created 08-12-2019 03:22 AM
Might be related to a container-reuse issue: HIVE-18786 -- perhaps disable tez.am.container.reuse.enabled in tez-site.xml to verify?
Created 09-04-2019 10:30 PM
Thanks you nailed it indeed. set hiveconf:tez.am.container.reuse.enabled=false; did the trick.