We are trying to create new partitioned and bucketed(1000) table from existing partition table of size 750GB. Mappers are completed successfully, but during the reduce phase, we are getting below error and the reduce are getting failed. We can see in DataNode Process and DataNode Web UI have alerts that connection not responding in Amabri?
2017-01-10 16:45:30,869 [INFO] [Thread-90] |hdfs.DFSClient|: Exception in createBlockOutputStream java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2291) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1376) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1295) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463) 2017-01-10 16:45:30,872 [INFO] [Thread-90] |hdfs.DFSClient|: Abandoning BP-1974649974--1481732158525:blk_1077768941_4031650 2017-01-10 16:45:30,891 [INFO] [Thread-90] |hdfs.DFSClient|: Excluding datanode DatanodeInfoWithStorage[:50010,DS-5dc52f7c-4497-457f-afca-c36a24b4f849,DISK]
Any help would be greatly appreciated.
Hi @VAMSI GUMMALLA.
I have one question about your bucketing.
I can't see your DDL, but have to assume that you are creating a table that is partitioned and that also has 1,000 buckets. Is that true? If so, that does mean that each partition will have 1,000 files - one per bucket.
So if your partition criteria is to partition by day, then every day you will be creating 1,000 new files... 365,000 per year. Once you get past approximately 10,000 partitions you can have performance problems.
I would check your requirement to bucket into 1,000 buckets - or to even bucket at all. A recommendation is to be careful to keep your number of files reasonable. That number of buckets would definitely cause pressure on your table create query, especially if you are doing dynamic partitioning.
I hope this helps.