Support Questions
Find answers, ask questions, and share your expertise

Hive job is failing after a long run with "Cannot add more than 2147483647 elements to a PTF Partition hive" error

INSERT INTO TABLE datamart.fact
SELECT row_number() over(order by visitor_key) as clickstream_key,

When i run this query, it is launching around 169 map jobs and 1009 reduce jobs. All the map jobs and 1008 reduce jobs are completing in less than 30 mins, However the last reduce job is taking too long and doing new attempts for the task rather than failing the job with error.

I have gone through the logs, there werent any other warnings or notable errors except for this.

 [TezChild] |tez.TezProcessor|: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{"reducesinkkey0":0,"reducesinkkey1":1},"value":{"_col0":14858534,"_col1":96293756,"_col2":13528511,"_col3":"2016-03-06 12:51:14","_col4":17,"_col5":"","_col6":"","_col7":";;;;;103=::hash::0|104=::hash::0|111=::hash::0|133=::hash::0","_col8":"","_col9":"","_col10":null}} at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$ at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource.pushRecord( at at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor( at at at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$ at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$ at Method) at at at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal( at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal( at at at java.util.concurrent.ThreadPoolExecutor.runWorker( at java.util.concurrent.ThreadPoolExecutor$ at Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Cannot add more than 2147483647 elements to a PTFPartition at org.apache.hadoop.hive.ql.exec.PTFPartition.append( at org.apache.hadoop.hive.ql.exec.PTFOperator$PTFInvocation.processRow( at org.apache.hadoop.hive.ql.exec.PTFOperator.process( at org.apache.hadoop.hive.ql.exec.Operator.forward( at org.apache.hadoop.hive.ql.exec.SelectOperator.process( at org.apache.hadoop.hive.ql.exec.tez.ReduceRecordSource$ ... 17 more 

Does this have anything to do with my existing configuration?

Size of mapper, reducer are 10 GB, 20 GB respectively.

Is there a work around this ? When I was googling around with the content of logs, I found out that there is an existing bug in tez ( ) . Is this the same bug I am encountered here ? I would really appreciate if someone could help me around this.

Note :

no of records of cf : 2968859945

Size of cf : 26.5 G


Super Guru
@vinay kumar

I am not hundred percent sure but I think you need to reduce your reducer size. You are getting this error, in the following file and it expects the number of rows for that particular reducer to be less than Inter.MAX_VALUE (see line 99). I think you have more than 2147483647 rows being processed by this one reducer. If you reduce the size of reducers such that no reducer processes more than 2147483647 records, then you should not run into this issue. (check line 99)

I hope this helps.


Should this be filed to hive jira?


It seems the error is because of the return type of row_number() which is int.

I am trying to generate unique key for around 3 billion records which is more than max value of integer type, which is


@vinay kumar

I think "order by visitor_key" is taking huge memory to process.

I would suggest you to run without "Order By"