Reply
New Contributor
Posts: 1
Registered: ‎08-26-2014

inserting to hive table using dynamic partition Reduce tasks failed for compartively larger data set

I am trying to insert data to a hive table using following code:

from books b insert overwrite table books_part partition (yop) select b.isbn, b.title, b.author,
b.publisher, b.img_url_small, b.img_url_medium, b.img_url_large,b.yop distribute by yop;

 

The Map job is completed but the Reduce job failed with following error log:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"038512158X","_col1":"No One Noticed Ralph (Reading-on-My-Own Book)","_col2":"Bonnie Bishop","_col3":"Random House Childrens Books","_col4":"http://images.amazon.com/images/P/038512158X.01.THUMBZZZ.jpg","_col5":"http://images.amazon.com/imag...
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:268)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:468)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":"038512158X","_col1":"No One Noticed Ralph (Reading-on-My-Own Book)","_col2":"Bonnie Bishop","_col3":"Random House Childrens Books","_col4":"http://images.amazon.com/images/P/038512158X.01.THUMBZZZ.jpg","_col5":"http://images.amazon.com/imag...
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:256)
... 7 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hive-root/hive_2014-09-15_11-33-50_652_9032520166014145621/_tmp.-ext-10000/yop=1979/_tmp.000000_0 could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1469)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)

at org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:573)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:744)
at org.apache.hadoop.hive.ql.exec.ExtractOperator.processOp(ExtractOperator.java:45)
at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
at org.apache.hadoop.hive.ql.exec.ExecReducer.reduce(ExecReducer.java:247)
... 7 more
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hive-root/hive_2014-09-15_11-33-50_652_9032520166014145621/_tmp.-ext-10000/yop=1979/_tmp.000000_0 could only be replicated to 0 nodes, instead of 1
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:1469)
at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.java:649)
at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:557)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1415)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1411)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1409)

at org.apache.hadoop.ipc.Client.call(Client.java:1104)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:226)
at $Proxy1.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor16.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at $Proxy1.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock(DFSClient.java:3185)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:3055)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2305)
at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2500)

 What should I do to fix the issue?

 

please notice that it ran successfuly while selecting from a table having 10 records. But for the complete table having around 350000 records it failed. I am using CDH3 quckstart VM.

Thanks!