Created 02-17-2016 12:35 PM
I have a Hadoop cluster(HDP 2.2) set-up in Eucalyptus environment. I have created an external table in Hive(0.14), using the below query:
CREATE EXTERNAL TABLE tempbatting (col_value STRING) LOCATION 's3n://hive-bucket/';
I'm using a custom S3 location, so I have set jets3t property in Hive configuration directory as below:
set s3service.https-only = true; set s3service.s3-endpoint = s3-customlocation.net; set s3service.s3-endpoint-http-port = 80; set s3service.s3-endpoint-https-port = 443; set s3service.disable-dns-buckets = true; set s3service.enable-storage-classes = false;
Though I'm able to execute simple select queries on the table successfully, the aggregate queries are failing. Below are the logs:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to hive-bucket.s3.amazonaws.com:443 timed out at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:416)
From the logs, the map-reduce job seems to access Amazon S3. I have tried using the set command for Hive(set fs.s3n.endpoint=s3-customlocation.net), but it didn't seem to work. Is there a way to specify custom end-point?
Created 02-22-2016 12:58 PM
Created 02-22-2016 12:58 PM
Created 02-26-2016 06:49 AM
Though have not yet upgraded to Hadoop 2.7, I made the configuration changes for s3a as per the documentation. On executing Hive create query, I got the below exception:
FAILED: AmazonClientException Unable to execute HTTP request: Connect to hive-bucket.s3.amazonaws.com:443 timed out
Created 03-22-2016 06:00 AM
I have upgraded to Hadoop 2.7 now. I have done configurations changes for s3a and the queries are executing successfully. Thank you.