Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive aggregate query failing for External table

avatar
Rising Star

I have a Hadoop cluster(HDP 2.2) set-up in Eucalyptus environment. I have created an external table in Hive(0.14), using the below query:

CREATE EXTERNAL TABLE tempbatting (col_value STRING) LOCATION 's3n://hive-bucket/';

I'm using a custom S3 location, so I have set jets3t property in Hive configuration directory as below:

set s3service.https-only = true;
set s3service.s3-endpoint = s3-customlocation.net;
set s3service.s3-endpoint-http-port   = 80;
set s3service.s3-endpoint-https-port = 443;
set s3service.disable-dns-buckets = true;
set s3service.enable-storage-classes = false;

Though I'm able to execute simple select queries on the table successfully, the aggregate queries are failing. Below are the logs:

Error: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to hive-bucket.s3.amazonaws.com:443 timed out
        at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:416)

From the logs, the map-reduce job seems to access Amazon S3. I have tried using the set command for Hive(set fs.s3n.endpoint=s3-customlocation.net), but it didn't seem to work. Is there a way to specify custom end-point?

1 ACCEPTED SOLUTION

avatar
Master Mentor
12 REPLIES 12

avatar
Master Mentor
@Phoncy Joseph

This is from AWS forums

"I tried to rerun my job again and this time it is finished successfully. So I guess it may be related with s3 service unstable in rent two days at least from error message.

I hope this issue not happen again."

link

avatar
Rising Star

I'm using a custom S3 for Eucalyptus, not the AWS one. I have been trying to resolve this since past few weeks.

avatar
Master Guru

OK just to repeat. You can access s3 through hive with simple queries? So it cannot be a connection problem right?

Perhaps too many parallel connections timing out when all the mappers spin up?

Do you see some tasks successfully completing and then some tasks failing after 3 retries? In this case it sounds like a timeout issue.

I have seen some issues in google like this that tried to fix it by increasing connection timeouts and retries. However mostly in presto forums.

However there are s3 parameters available in the hdfs-site configuration

https://hadoop.apache.org/docs/r2.6.3/hadoop-project-dist/hadoop-common/core-default.xml

fs.s3a.connection.timeout

avatar
Rising Star

Thanks for the response. Yes, I'm able to access S3 through simple Hive queries.From the logs, I could see that the map-reduce job is trying to connect to "hive-bucket.s3.amazonaws.com:443", which doesn't exist. I need to connect to a custom S3 endpoint, which is "s3-customlocation.net". I have gone through the hdfs-site configuration,but I couldnt find any parameter to set custom endpoint.

avatar
Master Guru

https://issues.apache.org/jira/browse/HADOOP-11261

Which version of Hadoop are you using?

"It also enables using a custom url pointing to an S3-compatible object store."

avatar
Rising Star

I'm using Hadoop 2.6.

avatar
Master Guru

I only understand half of the s3 problems but it might be that you need to upgrade if a custom url is what you want.

https://issues.apache.org/jira/browse/HADOOP-11261

"It also enables using a custom url pointing to an S3-compatible object store."

avatar
Master Mentor

@phoncy Joseph any progress on this?

avatar
Rising Star

@Artem Ervits Copied jets3t.properties to all data nodes. Currently I'm getting below exception:

org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><Resource>/hive-bucket</Resource><RequestId></RequestId></Error>
        at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:470)