- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive to S3 Error - timeout?
- Labels:
-
Apache Hive
Created ‎06-13-2017 02:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
We are using Hive to load data to S3 (using s3a). We've started seeing the following error:
2017-06-13 08:51:49,042 ERROR [main]: exec.Task (SessionState.java:printError(962)) - Failed with exception Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler). Response Code: 200, Response Text: OK com.amazonaws.AmazonClientException: Unable to unmarshall response (Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler). Response Code: 200, Response Text: OK at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:738) at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:399) at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:232) at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3528) at com.amazonaws.services.s3.AmazonS3Client.copyObject(AmazonS3Client.java:1507) at com.amazonaws.services.s3.transfer.internal.CopyCallable.copyInOneChunk(CopyCallable.java:143) at com.amazonaws.services.s3.transfer.internal.CopyCallable.call(CopyCallable.java:131) at com.amazonaws.services.s3.transfer.internal.CopyMonitor.copy(CopyMonitor.java:189) at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:134) at com.amazonaws.services.s3.transfer.internal.CopyMonitor.call(CopyMonitor.java:46) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: com.amazonaws.AmazonClientException: Failed to parse XML document with handler class com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CopyObjectResultHandler at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:150) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseCopyObjectResponse(XmlResponsesSaxParser.java:417) at com.amazonaws.services.s3.model.transform.Unmarshallers$CopyObjectUnmarshaller.unmarshall(Unmarshallers.java:192) at com.amazonaws.services.s3.model.transform.Unmarshallers$CopyObjectUnmarshaller.unmarshall(Unmarshallers.java:189) at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62) at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:44) at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:30) at com.amazonaws.http.AmazonHttpClient.handleResponse(AmazonHttpClient.java:712) ... 13 more Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) at java.net.SocketInputStream.read(SocketInputStream.java:170) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:973) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:930) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:166) at org.apache.http.impl.io.SocketInputBuffer.fillBuffer(SocketInputBuffer.java:90) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:281) at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:251) at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:209) at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:171) at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138) at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.BufferedReader.fill(BufferedReader.java:161) at java.io.BufferedReader.read1(BufferedReader.java:212) at java.io.BufferedReader.read(BufferedReader.java:286) at org.apache.xerces.impl.XMLEntityScanner.load(Unknown Source) at org.apache.xerces.impl.XMLEntityScanner.skipSpaces(Unknown Source) at org.apache.xerces.impl.XMLDocumentScannerImpl$PrologDispatcher.dispatch(Unknown Source) at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XML11Configuration.parse(Unknown Source) at org.apache.xerces.parsers.XMLParser.parse(Unknown Source) at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source) at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:141) ... 20 more
Anyone else seen this before? Is it a data size/length issue? Loading too much data at once? Timeout?
Created ‎06-20-2017 02:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are the final hive configs that seem to have fixed this issue. Seems to be related to timeouts.
set hive.execution.engine=mr; set hive.default.fileformat=Orc; set hive.exec.orc.default.compress=SNAPPY; set hive.exec.copyfile.maxsize=1099511627776; set hive.warehouse.subdir.inherit.perms=false; set hive.metastore.pre.event.listeners=; set hive.stats.fetch.partition.stats=false; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.dynamic.partition=true; set fs.trash.interval=0; set fs.s3.buffer.dir=/tmp/s3a; set fs.s3a.attempts.maximum=50; set fs.s3a.connection.establish.timeout=120000; set fs.s3a.connection.timeout=120000; set fs.s3a.fast.upload=true; set fs.s3a.fast.upload.buffer=disk; set fs.s3a.multiobjectdelete.enable=true; set fs.s3a.max.total.tasks=2000; set fs.s3a.threads.core=30; set fs.s3a.threads.max=512; set fs.s3a.connection.maximum=30; set fs.s3a.fast.upload.active.blocks=12; set fs.s3a.threads.keepalivetime=120;
Created ‎06-13-2017 03:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This seems to be random. Sometimes we see this error; if we run it again and it succeeds. Not sure why we're seeing it though.
Here are the hive properties we're using:
set hive.execution.engine=mr; set hive.default.fileformat=Orc; set hive.exec.orc.default.compress=SNAPPY; set fs.s3a.attempts.maximum=50; set fs.s3a.connection.establish.timeout=30000; set fs.s3a.connection.timeout=30000; set fs.s3a.fast.upload=true; set fs.s3a.fast.upload.buffer=disk; set fs.s3n.multipart.uploads.enabled=true; set fs.s3a.threads.keepalivetime=60; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.dynamic.partition=true;
We're running HDP 2.4.2 (HDP-2.4.2.0-258).
Created ‎06-20-2017 02:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Here are the final hive configs that seem to have fixed this issue. Seems to be related to timeouts.
set hive.execution.engine=mr; set hive.default.fileformat=Orc; set hive.exec.orc.default.compress=SNAPPY; set hive.exec.copyfile.maxsize=1099511627776; set hive.warehouse.subdir.inherit.perms=false; set hive.metastore.pre.event.listeners=; set hive.stats.fetch.partition.stats=false; set hive.exec.dynamic.partition.mode=nonstrict; set hive.exec.dynamic.partition=true; set fs.trash.interval=0; set fs.s3.buffer.dir=/tmp/s3a; set fs.s3a.attempts.maximum=50; set fs.s3a.connection.establish.timeout=120000; set fs.s3a.connection.timeout=120000; set fs.s3a.fast.upload=true; set fs.s3a.fast.upload.buffer=disk; set fs.s3a.multiobjectdelete.enable=true; set fs.s3a.max.total.tasks=2000; set fs.s3a.threads.core=30; set fs.s3a.threads.max=512; set fs.s3a.connection.maximum=30; set fs.s3a.fast.upload.active.blocks=12; set fs.s3a.threads.keepalivetime=120;
Created ‎06-29-2017 07:45 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
That error from AWS suspected to be the S3 connection being broken, and the XML parser in the Amazon SDK getting the end of the document & failing. I'm surprised you are seeing it frequently though; it's generally pretty rare (i.e. rare enough that we've not got that much details on what is going on).
It might be fs.s3a.connection.timeout is the parameter to tune, but the other possiblity is that you have too many threads/tasks talking to S3 and either your network bandwidth is used up or AWS S3 is actually throttling you. Try smaller values of fs.s3a.threads.max (say 64 or fewer) and of fs.s3a.max.total.tasks (try 128). That cuts down the # of threads which may write at a time, and then has a smaller queue of waiting blocks to write before it blocks whatever thread is actually generating lots of of data.
