Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

HIVE ACID: Update error when partition column is in the where clause

Explorer

Consider this example table:

create table config.test (param string, value string) 
partitioned by (jobid string) 
clustered by (param) into 3 buckets
stored as orcfile
tblproperties('transactional'='true');

I want this to be an ACID table, don't really need the partitioning and clustering, but creating this following all the guides. Assume I want to update some values:

insert into config.test partition(jobid) values ('1', '2', '3');
insert into config.test partition(jobid) values ('2', '3', '4');
insert into config.test partition(jobid) values ('4', '4', '4');

update config.test set value = '99' where jobid = '4' and param = '4';

Inserts work fine, Update returns the following "java.lang.NegativeArraySizeException" error:

Error
 while processing statement: FAILED: Execution Error, return code 2 from
 org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, 
vertexName=Reducer 2, vertexId=vertex_1535365185102_0002_140_01, 
diagnostics=[Task failed, taskId=task_1535365185102_0002_140_01_000000, 
diagnostics=[TaskAttempt 0 failed, info=[Error: Error while running task
 ( failure ) : 
attempt_1535365185102_0002_140_01_000000_0:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
	at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)


	at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
	at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213)
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
	... 15 more
Caused by: java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357)
	... 21 more
], TaskAttempt 1 failed, info=[Error: Error while running task ( failure
 ) : 
attempt_1535365185102_0002_140_01_000000_1:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
	at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
	at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213)
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
	... 15 more
Caused by: java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357)
	... 21 more
], TaskAttempt 2 failed, info=[Error: Error while running task ( failure
 ) : 
attempt_1535365185102_0002_140_01_000000_2:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
	at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
	at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213)
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
	... 15 more
Caused by: java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357)
	... 21 more
], TaskAttempt 3 failed, info=[Error: Error while running task ( failure
 ) : 
attempt_1535365185102_0002_140_01_000000_3:java.lang.RuntimeException: 
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:218)
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:172)
	at 
org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:370)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:73)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable$1.run(TaskRunner2Callable.java:61)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:61)
	at 
org.apache.tez.runtime.task.TaskRunner2Callable.callInternal(TaskRunner2Callable.java:37)
	at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
	at 
org.apache.hadoop.hive.llap.daemon.impl.StatsRecordingThreadPool$WrappedCallable.call(StatsRecordingThreadPool.java:110)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:442)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:366)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:556)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:508)
	at 
org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:376)
	at 
org.apache.hadoop.hive.ql.exec.tez.ReduceRecordProcessor.init(ReduceRecordProcessor.java:213)
	at 
org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:188)
	... 15 more
Caused by: java.lang.NegativeArraySizeException
	at 
org.apache.hadoop.hive.ql.exec.FileSinkOperator.initializeOp(FileSinkOperator.java:357)
	... 21 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 
killedTasks:0, Vertex vertex_1535365185102_0002_140_01 [Reducer 2] 
killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to 
VERTEX_FAILURE. failedVertices:1 killedVertices:0

What could be the problem? I'm working on a perfectly running cluster and we do have some ACID tables working. But I can't figure out why this simple example doesn't work?

What works: The UPDATE works when "jobid = '4'" is NOT in the WHERE clause, but it updates too many rows.

Why is this a problem? I'm updating a "param" field which is neither bucketed nor a partition field, those two fields are only i the WHERE clause.

If this is really because the "jobid" is a partitioning field, I could create dummy/mirror fields that would have the same info but not be partition keys, but that does seem redundant.

Any Ideas?

1 REPLY 1

Explorer

In other words, this works:

        
    create table config.test (jobid string, param string, value string) 
    partitioned by (dummy string) 
    clustered by (param) into 3 buckets
    stored as orcfile
    tblproperties('transactional'='true');


    insert into config.test partition(dummy) values ('1', '2', '3', '1');
    insert into config.test partition(dummy) values ('2', '3', '4', '1');
    insert into config.test partition(dummy) values ('4', '4', '4', '1');
    
    update config.test set value = '99' where jobid = '4' and param = '4' ;

But it does require a "dummy" field which does not serve any purpose. Are there any cleaner workarounds?