bucketId out of range: 4147

dieden9 — Tue, 13 Oct 2020 07:13:45 GMT

Hi!

I am running a scheduled job that consists of an insert-select query in hive 3.0/hdp 3 as the following

Insert into table t1 as select * from t2 where timestamp > "predefined timestamp"

The job was running flawless until out of sudden it started failing with the following error:

Caused by: java.lang.IllegalArgumentException: bucketId out of range: 4147
[2020-10-13 06:58:12,214] INFO - 	at org.apache.hadoop.hive.ql.io.BucketCodec$2.encode(BucketCodec.java:94)
[2020-10-13 06:58:12,214] INFO - 	at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:271)
[2020-10-13 06:58:12,214] INFO - 	at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:278)
[2020-10-13 06:58:12,214] INFO - 	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordUpdater(HiveFileFormatUtils.java:350)
[2020-10-13 06:58:12,214] INFO - 	at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getAcidRecordUpdater(HiveFileFormatUtils.java:336)
[2020-10-13 06:58:12,214] INFO - 	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketForFileIdx(FileSinkOperator.java:801)
[2020-10-13 06:58:12,214] INFO - 	at org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:741)
[2020-10-13 06:58:12,214] INFO - 	... 45 more
[2020-10-13 06:58:12,214] INFO - ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:3493, Vertex vertex_1602198520469_101787_31_02 [Map 1] killed/failed due to:OWN_TASK_FAILURE]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1602198520469_101787_31_03, diagnostics=[Vertex received Kill while in RUNNING state., Vertex did not succeed due to OTHER_VERTEX_FAILURE, failedTasks:0 killedTasks:1, Vertex vertex_1602198520469_101787_31_03 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (state=08S01,code=2)

I am clueless of what is causing this especially that the job hasn't changed.

Any idea of how I can solve this issue? 😞

Re: bucketId out of range: 4147

balajip — Tue, 13 Oct 2020 07:54:34 GMT

Seems you are hitting the number of max buckets limit in hive.
For more information please refer below apache jira.
https://issues.apache.org/jira/browse/TEZ-4130

Re: bucketId out of range: 4147

dieden9 — Wed, 14 Oct 2020 07:23:11 GMT

Hi @balajip

thanks for the reply. excuse my ignorace as I am still new to cloudera platform.
Is there a config that I can set to override the bucket limit? or should I apply that patch?

question Re: bucketId out of range: 4147 in Support Questions

bucketId out of range: 4147

Re: bucketId out of range: 4147

Re: bucketId out of range: 4147