Archives of Support Questions (Read Only)

phoncy_joseph · ‎02-17-2016

I have a Hadoop cluster(HDP 2.2) set-up in Eucalyptus environment. I have created an external table in Hive(0.14), using the below query:

CREATE EXTERNAL TABLE tempbatting (col_value STRING) LOCATION 's3n://hive-bucket/';

I'm using a custom S3 location, so I have set jets3t property in Hive configuration directory as below:

set s3service.https-only = true;
set s3service.s3-endpoint = s3-customlocation.net;
set s3service.s3-endpoint-http-port   = 80;
set s3service.s3-endpoint-https-port = 443;
set s3service.disable-dns-buckets = true;
set s3service.enable-storage-classes = false;

Though I'm able to execute simple select queries on the table successfully, the aggregate queries are failing. Below are the logs:

Error: java.io.IOException: java.lang.reflect.InvocationTargetException
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97)
        at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57)
 
Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to hive-bucket.s3.amazonaws.com:443 timed out
        at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:416)

From the logs, the map-reduce job seems to access Amazon S3. I have tried using the set command for Hive(set fs.s3n.endpoint=s3-customlocation.net), but it didn't seem to work. Is there a way to specify custom end-point?

aervits · ‎02-22-2016

tey the s3a https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html

View solution in original post

aervits · ‎02-22-2016

tey the s3a https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/index.html

phoncy_joseph · ‎02-26-2016

Though have not yet upgraded to Hadoop 2.7, I made the configuration changes for s3a as per the documentation. On executing Hive create query, I got the below exception:

FAILED: AmazonClientException Unable to execute HTTP request: Connect to hive-bucket.s3.amazonaws.com:443 timed out

phoncy_joseph · ‎03-22-2016

I have upgraded to Hadoop 2.7 now. I have done configurations changes for s3a and the queries are executing successfully. Thank you.

Cloudera Community

Archives of Support Questions (Read Only)

Hive aggregate query failing for External table