Created 02-17-2016 12:35 PM
I have a Hadoop cluster(HDP 2.2) set-up in Eucalyptus environment. I have created an external table in Hive(0.14), using the below query:
CREATE EXTERNAL TABLE tempbatting (col_value STRING) LOCATION 's3n://hive-bucket/';
I'm using a custom S3 location, so I have set jets3t property in Hive configuration directory as below:
set s3service.https-only = true; set s3service.s3-endpoint = s3-customlocation.net; set s3service.s3-endpoint-http-port = 80; set s3service.s3-endpoint-https-port = 443; set s3service.disable-dns-buckets = true; set s3service.enable-storage-classes = false;
Though I'm able to execute simple select queries on the table successfully, the aggregate queries are failing. Below are the logs:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to hive-bucket.s3.amazonaws.com:443 timed out at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:416)
From the logs, the map-reduce job seems to access Amazon S3. I have tried using the set command for Hive(set fs.s3n.endpoint=s3-customlocation.net), but it didn't seem to work. Is there a way to specify custom end-point?
Created 02-22-2016 12:58 PM
Created 02-17-2016 12:38 PM
This is from AWS forums
"I tried to rerun my job again and this time it is finished successfully. So I guess it may be related with s3 service unstable in rent two days at least from error message.
I hope this issue not happen again."
Created 02-17-2016 12:41 PM
I'm using a custom S3 for Eucalyptus, not the AWS one. I have been trying to resolve this since past few weeks.
Created 02-17-2016 12:57 PM
OK just to repeat. You can access s3 through hive with simple queries? So it cannot be a connection problem right?
Perhaps too many parallel connections timing out when all the mappers spin up?
Do you see some tasks successfully completing and then some tasks failing after 3 retries? In this case it sounds like a timeout issue.
I have seen some issues in google like this that tried to fix it by increasing connection timeouts and retries. However mostly in presto forums.
However there are s3 parameters available in the hdfs-site configuration
https://hadoop.apache.org/docs/r2.6.3/hadoop-project-dist/hadoop-common/core-default.xml
fs.s3a.connection.timeout
Created 02-17-2016 01:08 PM
Thanks for the response. Yes, I'm able to access S3 through simple Hive queries.From the logs, I could see that the map-reduce job is trying to connect to "hive-bucket.s3.amazonaws.com:443", which doesn't exist. I need to connect to a custom S3 endpoint, which is "s3-customlocation.net". I have gone through the hdfs-site configuration,but I couldnt find any parameter to set custom endpoint.
Created 02-17-2016 01:33 PM
https://issues.apache.org/jira/browse/HADOOP-11261
Which version of Hadoop are you using?
"It also enables using a custom url pointing to an S3-compatible object store."
Created 02-17-2016 04:34 PM
I'm using Hadoop 2.6.
Created 02-17-2016 07:23 PM
I only understand half of the s3 problems but it might be that you need to upgrade if a custom url is what you want.
https://issues.apache.org/jira/browse/HADOOP-11261
"It also enables using a custom url pointing to an S3-compatible object store."
Created 02-20-2016 06:19 PM
@phoncy Joseph any progress on this?
Created 02-22-2016 10:09 AM
@Artem Ervits Copied jets3t.properties to all data nodes. Currently I'm getting below exception:
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><Resource>/hive-bucket</Resource><RequestId></RequestId></Error> at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:470)