Created 08-28-2018 07:59 AM
Consider this example table:
create table config.test (param string, value string) partitioned by (jobid string) clustered by (param) into 3 buckets stored as orcfile tblproperties('transactional'='true');
I want this to be an ACID table, don't really need the partitioning and clustering, but creating this following all the guides. Assume I want to update some values:
insert into config.test partition(jobid) values ('1', '2', '3'); insert into config.test partition(jobid) values ('2', '3', '4'); insert into config.test partition(jobid) values ('4', '4', '4'); update config.test set value = '99' where jobid = '4' and param = '4';
Inserts work fine, Update returns the following "java.lang.NegativeArraySizeException" error:
Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Reducer 2, vertexId=vertex_1535365185102_0002_140_01, diagnostics=[Task failed, taskId=task_1535365185102_0002_140_01_000000, diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task ( failure ) : attempt_1535365185102_0002_140_01_000000_0:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188) ... 15 more Caused by: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357) ... 21 more ], TaskAttempt 1 failed, info=[Error: Error while running task ( failure ) : attempt_1535365185102_0002_140_01_000000_1:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188) ... 15 more Caused by: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357) ... 21 more ], TaskAttempt 2 failed, info=[Error: Error while running task ( failure ) : attempt_1535365185102_0002_140_01_000000_2:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188) ... 15 more Caused by: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357) ... 21 more ], TaskAttempt 3 failed, info=[Error: Error while running task ( failure ) : attempt_1535365185102_0002_140_01_000000_3:java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73) at org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61) at org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556) at org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508) at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376) at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188) ... 15 more Caused by: java.lang.NegativeArraySizeException at org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357) ... 21 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1535365185102_0002_140_01 [Reducer 2] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
What could be the problem? I'm working on a perfectly running cluster and we do have some ACID tables working. But I can't figure out why this simple example doesn't work?
What works: The UPDATE works when "jobid = '4'" is NOT in the WHERE clause, but it updates too many rows.
Why is this a problem? I'm updating a "param" field which is neither bucketed nor a partition field, those two fields are only i the WHERE clause.
If this is really because the "jobid" is a partitioning field, I could create dummy/mirror fields that would have the same info but not be partition keys, but that does seem redundant.
Any Ideas?
Created 08-28-2018 08:13 AM
In other words, this works:
create table config.test (jobid string, param string, value string) partitioned by (dummy string) clustered by (param) into 3 buckets stored as orcfile tblproperties('transactional'='true'); insert into config.test partition(dummy) values ('1', '2', '3', '1'); insert into config.test partition(dummy) values ('2', '3', '4', '1'); insert into config.test partition(dummy) values ('4', '4', '4', '1'); update config.test set value = '99' where jobid = '4' and param = '4' ;
But it does require a "dummy" field which does not serve any purpose. Are there any cleaner workarounds?