Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive to S3 Error - timeout?

avatar
Rising Star

We are using Hive to load data to S3 (using s3a). We've started seeing the following error:

2017-06-13 08:51:49,042 ERROR [main]: exec.Task (SessionState.java:printError(962)) - Failed with exception Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler). Response Code: 200, Response Text: OK com.amazonaws.AmazonClientException: Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler). Response Code: 200, Response Text: OK at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:738) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:399) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507) at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143) at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131) at com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189) at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134) at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:150) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseCopyObjectResponse(XmlResponsesSaxParser.java:417) at com.amazonaws.services.s3.model.transform.Unmarshallers$CopyObjectUnmarshaller.unmarshall(Unmarshallers.java:192) at com.amazonaws.services.s3.model.transform.Unmarshallers$CopyObjectUnmarshaller.unmarshall(Unmarshallers.java:189) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:44) at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:30) at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:712) ... 13 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:251) at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:209) at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:171) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.read1(BufferedReader.java:212) at java.io.BufferedReader.read(BufferedReader.java:286) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.skipSpaces(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:141) ... 20 more

Anyone else seen this before? Is it a data size/length issue? Loading too much data at once? Timeout?

1 ACCEPTED SOLUTION

avatar
Rising Star

Here are the final hive configs that seem to have fixed this issue. Seems to be related to timeouts.

set hive.execution.engine=mr;
set hive.default.fileformat=Orc;
set hive.exec.orc.default.compress=SNAPPY;
set hive.exec.copyfile.maxsize=1099511627776;
set hive.warehouse.subdir.inherit.perms=false;
set hive.metastore.pre.event.listeners=;
set hive.stats.fetch.partition.stats=false;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set fs.trash.interval=0;
set fs.s3.buffer.dir=/tmp/s3a;
set fs.s3a.attempts.maximum=50;
set fs.s3a.connection.establish.timeout=120000;
set fs.s3a.connection.timeout=120000;
set fs.s3a.fast.upload=true;
set fs.s3a.fast.upload.buffer=disk;
set fs.s3a.multiobjectdelete.enable=true;
set fs.s3a.max.total.tasks=2000;
set fs.s3a.threads.core=30;
set fs.s3a.threads.max=512;
set fs.s3a.connection.maximum=30;       
set fs.s3a.fast.upload.active.blocks=12;
set fs.s3a.threads.keepalivetime=120;

View solution in original post

3 REPLIES 3

avatar
Rising Star

This seems to be random. Sometimes we see this error; if we run it again and it succeeds. Not sure why we're seeing it though.

Here are the hive properties we're using:

set hive.execution.engine=mr;
set hive.default.fileformat=Orc;
set hive.exec.orc.default.compress=SNAPPY;
set fs.s3a.attempts.maximum=50;
set fs.s3a.connection.establish.timeout=30000;
set fs.s3a.connection.timeout=30000;
set fs.s3a.fast.upload=true;
set fs.s3a.fast.upload.buffer=disk;
set fs.s3n.multipart.uploads.enabled=true;
set fs.s3a.threads.keepalivetime=60;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

We're running HDP 2.4.2 (HDP-2.4.2.0-258).

avatar
Rising Star

Here are the final hive configs that seem to have fixed this issue. Seems to be related to timeouts.

set hive.execution.engine=mr;
set hive.default.fileformat=Orc;
set hive.exec.orc.default.compress=SNAPPY;
set hive.exec.copyfile.maxsize=1099511627776;
set hive.warehouse.subdir.inherit.perms=false;
set hive.metastore.pre.event.listeners=;
set hive.stats.fetch.partition.stats=false;
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;
set fs.trash.interval=0;
set fs.s3.buffer.dir=/tmp/s3a;
set fs.s3a.attempts.maximum=50;
set fs.s3a.connection.establish.timeout=120000;
set fs.s3a.connection.timeout=120000;
set fs.s3a.fast.upload=true;
set fs.s3a.fast.upload.buffer=disk;
set fs.s3a.multiobjectdelete.enable=true;
set fs.s3a.max.total.tasks=2000;
set fs.s3a.threads.core=30;
set fs.s3a.threads.max=512;
set fs.s3a.connection.maximum=30;       
set fs.s3a.fast.upload.active.blocks=12;
set fs.s3a.threads.keepalivetime=120;

avatar

That error from AWS suspected to be the S3 connection being broken, and the XML parser in the Amazon SDK getting the end of the document & failing. I'm surprised you are seeing it frequently though; it's generally pretty rare (i.e. rare enough that we've not got that much details on what is going on).

It might be fs.s3a.connection.timeout is the parameter to tune, but the other possiblity is that you have too many threads/tasks talking to S3 and either your network bandwidth is used up or AWS S3 is actually throttling you. Try smaller values of fs.s3a.threads.max (say 64 or fewer) and of fs.s3a.max.total.tasks (try 128). That cuts down the # of threads which may write at a time, and then has a smaller queue of waiting blocks to write before it blocks whatever thread is actually generating lots of of data.