- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive aggregate query failing for External table
- Labels:
-
Apache Hive
Created ‎02-17-2016 12:35 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have a Hadoop cluster(HDP 2.2) set-up in Eucalyptus environment. I have created an external table in Hive(0.14), using the below query:
CREATE EXTERNAL TABLE tempbatting (col_value STRING) LOCATION 's3n://hive-bucket/';
I'm using a custom S3 location, so I have set jets3t property in Hive configuration directory as below:
set s3service.https-only = true; set s3service.s3-endpoint = s3-customlocation.net; set s3service.s3-endpoint-http-port = 80; set s3service.s3-endpoint-https-port = 443; set s3service.disable-dns-buckets = true; set s3service.enable-storage-classes = false;
Though I'm able to execute simple select queries on the table successfully, the aggregate queries are failing. Below are the logs:
Error: java.io.IOException: java.lang.reflect.InvocationTargetException at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderCreationException(HiveIOExceptionHandlerChain.java:97) at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderCreationException(HiveIOExceptionHandlerUtil.java:57) Caused by: org.apache.http.conn.ConnectTimeoutException: Connect to hive-bucket.s3.amazonaws.com:443 timed out at org.apache.http.conn.ssl.SSLSocketFactory.connectSocket(SSLSocketFactory.java:416)
From the logs, the map-reduce job seems to access Amazon S3. I have tried using the set command for Hive(set fs.s3n.endpoint=s3-customlocation.net), but it didn't seem to work. Is there a way to specify custom end-point?
Created ‎02-22-2016 12:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎02-17-2016 12:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is from AWS forums
"I tried to rerun my job again and this time it is finished successfully. So I guess it may be related with s3 service unstable in rent two days at least from error message.
I hope this issue not happen again."
Created ‎02-17-2016 12:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using a custom S3 for Eucalyptus, not the AWS one. I have been trying to resolve this since past few weeks.
Created ‎02-17-2016 12:57 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
OK just to repeat. You can access s3 through hive with simple queries? So it cannot be a connection problem right?
Perhaps too many parallel connections timing out when all the mappers spin up?
Do you see some tasks successfully completing and then some tasks failing after 3 retries? In this case it sounds like a timeout issue.
I have seen some issues in google like this that tried to fix it by increasing connection timeouts and retries. However mostly in presto forums.
However there are s3 parameters available in the hdfs-site configuration
https://hadoop.apache.org/docs/r2.6.3/hadoop-project-dist/hadoop-common/core-default.xml
fs.s3a.connection.timeout
Created ‎02-17-2016 01:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the response. Yes, I'm able to access S3 through simple Hive queries.From the logs, I could see that the map-reduce job is trying to connect to "hive-bucket.s3.amazonaws.com:443", which doesn't exist. I need to connect to a custom S3 endpoint, which is "s3-customlocation.net". I have gone through the hdfs-site configuration,but I couldnt find any parameter to set custom endpoint.
Created ‎02-17-2016 01:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
https://issues.apache.org/jira/browse/HADOOP-11261
Which version of Hadoop are you using?
"It also enables using a custom url pointing to an S3-compatible object store."
Created ‎02-17-2016 04:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I'm using Hadoop 2.6.
Created ‎02-17-2016 07:23 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I only understand half of the s3 problems but it might be that you need to upgrade if a custom url is what you want.
https://issues.apache.org/jira/browse/HADOOP-11261
"It also enables using a custom url pointing to an S3-compatible object store."
- Fix Version/s:2.7.0
Created ‎02-20-2016 06:19 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@phoncy Joseph any progress on this?
Created ‎02-22-2016 10:09 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Artem Ervits Copied jets3t.properties to all data nodes. Currently I'm getting below exception:
org.apache.hadoop.fs.s3.S3Exception: org.jets3t.service.ServiceException: S3 Error Message. -- ResponseCode: 403, ResponseStatus: Forbidden, XML Error Message: <?xml version="1.0" encoding="UTF-8"?><Error><Code>AccessDenied</Code><Message>Access Denied</Message><Resource>/hive-bucket</Resource><RequestId></RequestId></Error> at org.apache.hadoop.fs.s3native.Jets3tNativeFileSystemStore.processException(Jets3tNativeFileSystemStore.java:470)
