Support Questions

phoncy_joseph · ‎02-17-2016

I have a Hadoop cluster(HDP 2.2) set-up in Eucalyptus environment. I have created an external table in Hive(0.14), using the below query:

CREATE EXTERNAL TABLE tempbatting (col_value STRING) LOCATION 's3n://hive-bucket/';

I'm using a custom S3 location, so I have set jets3t property in Hive configuration directory as below:

set s3service.https-only = true;
set s3service.s3-endpoint = s3-customlocation.net;
set s3service.s3-endpoint-http-port   = 80;
set s3service.s3-endpoint-https-port = 443;
set s3service.disable-dns-buckets = true;
set s3service.enable-storage-classes = false;

Though I'm able to execute simple select queries on the table successfully, the aggregate queries are failing. Below are the logs:

Error: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to hive-bucket.s3.amazonaws.com:443 timed out
        at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:416)

From the logs, the map-reduce job seems to access Amazon S3. I have tried using the set command for Hive(set fs.s3n.endpoint=s3-customlocation.net), but it didn't seem to work. Is there a way to specify custom end-point?

aervits · ‎02-22-2016

tey the s3a https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html

View solution in original post

nsabharwal · ‎02-17-2016

@Phoncy Joseph

This is from AWS forums

"I tried to rerun my job again and this time it is finished successfully. So I guess it may be related with s3 service unstable in rent two days at least from error message.

I hope this issue not happen again."

link

phoncy_joseph · ‎02-17-2016

I'm using a custom S3 for Eucalyptus, not the AWS one. I have been trying to resolve this since past few weeks.

bleonhardi · ‎02-17-2016

OK just to repeat. You can access s3 through hive with simple queries? So it cannot be a connection problem right?

Perhaps too many parallel connections timing out when all the mappers spin up?

Do you see some tasks successfully completing and then some tasks failing after 3 retries? In this case it sounds like a timeout issue.

I have seen some issues in google like this that tried to fix it by increasing connection timeouts and retries. However mostly in presto forums.

However there are s3 parameters available in the hdfs-site configuration

https://hadoop.apache.org/docs/r2.6.3/hadoop-project-dist/hadoop-common/core-default.xml

fs.s3a.connection.timeout

phoncy_joseph · ‎02-17-2016

Thanks for the response. Yes, I'm able to access S3 through simple Hive queries.From the logs, I could see that the map-reduce job is trying to connect to "hive-bucket.s3.amazonaws.com:443", which doesn't exist. I need to connect to a custom S3 endpoint, which is "s3-customlocation.net". I have gone through the hdfs-site configuration,but I couldnt find any parameter to set custom endpoint.

bleonhardi · ‎02-17-2016

https://issues.apache.org/jira/browse/HADOOP-11261

Which version of Hadoop are you using?

"It also enables using a custom url pointing to an S3-compatible object store."

phoncy_joseph · ‎02-17-2016

I'm using Hadoop 2.6.

bleonhardi · ‎02-17-2016

I only understand half of the s3 problems but it might be that you need to upgrade if a custom url is what you want.

https://issues.apache.org/jira/browse/HADOOP-11261

"It also enables using a custom url pointing to an S3-compatible object store."

Fix Version/s:2.7.0

aervits · ‎02-20-2016

@phoncy Joseph any progress on this?

phoncy_joseph · ‎02-22-2016

@Artem Ervits Copied jets3t.properties to all data nodes. Currently I'm getting below exception:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><Resource>/hive-bucket</Resource><RequestId></RequestId></Error>
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:470)

Cloudera Community

Support Questions

Hive aggregate query failing for External table