Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Hive-Partition and bucket table creation failed

Highlighted

Hive-Partition and bucket table creation failed

New Contributor

We are trying to create new partitioned and bucketed(1000) table from existing partition table of size 750GB. Mappers are completed successfully, but during the reduce phase, we are getting below error and the reduce are getting failed. We can see in DataNode Process and DataNode Web UI have alerts that connection not responding in Amabri?

2017-01-10 16:45:30,869 [INFO] [Thread-90] |hdfs.DFSClient|: Exception in createBlockOutputStream java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) at sun.nio.ch.IOUtil.read(IOUtil.java:197) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) at java.io.FilterInputStream.read(FilterInputStream.java:83) at java.io.FilterInputStream.read(FilterInputStream.java:83) at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2291) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1376) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1295) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:463) 2017-01-10 16:45:30,872 [INFO] [Thread-90] |hdfs.DFSClient|: Abandoning BP-1974649974--1481732158525:blk_1077768941_4031650 2017-01-10 16:45:30,891 [INFO] [Thread-90] |hdfs.DFSClient|: Excluding datanode DatanodeInfoWithStorage[:50010,DS-5dc52f7c-4497-457f-afca-c36a24b4f849,DISK]

Execution Engine=Tez.

Any help would be greatly appreciated.

2 REPLIES 2

Re: Hive-Partition and bucket table creation failed

New Contributor
@VAMSI GUMMALLA

Seems like there are too many open files. Please check ulimit value for OS(limits.conf) and datanode(in hadoop-env)

Regards,

Anubhav

Re: Hive-Partition and bucket table creation failed

Hi @VAMSI GUMMALLA.

I have one question about your bucketing.

I can't see your DDL, but have to assume that you are creating a table that is partitioned and that also has 1,000 buckets. Is that true? If so, that does mean that each partition will have 1,000 files - one per bucket.

So if your partition criteria is to partition by day, then every day you will be creating 1,000 new files... 365,000 per year. Once you get past approximately 10,000 partitions you can have performance problems.

I would check your requirement to bucket into 1,000 buckets - or to even bucket at all. A recommendation is to be careful to keep your number of files reasonable. That number of buckets would definitely cause pressure on your table create query, especially if you are doing dynamic partitioning.

I hope this helps.

Don't have an account?
Coming from Hortonworks? Activate your account here