Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hive aggregate query failing for External table

avatar
Rising Star

I have a Hadoop cluster(HDP 2.2) set-up in Eucalyptus environment. I have created an external table in Hive(0.14), using the below query:

CREATE EXTERNAL TABLE tempbatting (col_value STRING) LOCATION 's3n://hive-bucket/';

I'm using a custom S3 location, so I have set jets3t property in Hive configuration directory as below:

set s3service.https-only = true;
set s3service.s3-endpoint = s3-customlocation.net;
set s3service.s3-endpoint-http-port   = 80;
set s3service.s3-endpoint-https-port = 443;
set s3service.disable-dns-buckets = true;
set s3service.enable-storage-classes = false;

Though I'm able to execute simple select queries on the table successfully, the aggregate queries are failing. Below are the logs:

Error: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to hive-bucket.s3.amazonaws.com:443 timed out
        at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:416)

From the logs, the map-reduce job seems to access Amazon S3. I have tried using the set command for Hive(set fs.s3n.endpoint=s3-customlocation.net), but it didn't seem to work. Is there a way to specify custom end-point?

1 ACCEPTED SOLUTION

avatar
Master Mentor
12 REPLIES 12

avatar
Master Mentor

avatar
Rising Star

Though have not yet upgraded to Hadoop 2.7, I made the configuration changes for s3a as per the documentation. On executing Hive create query, I got the below exception:

FAILED: AmazonClientException Unable to execute HTTP request: Connect to hive-bucket.s3.amazonaws.com:443 timed out

avatar
Rising Star

I have upgraded to Hadoop 2.7 now. I have done configurations changes for s3a and the queries are executing successfully. Thank you.