Created 02-28-2016 09:23 PM
Does anyone know how to make hive default to S3 so each table does not need to be external? Is this possible?
There are articles such as http://blog.sequenceiq.com/blog/2014/11/17/datalake-cloudbreak-2/ which indicate this is possible, but when one does this with the HDP 2.3 it appears the HiveServer2 fails when trying to access the webhdfs location including s3.
I set the hive.metastore.warehouse.dir=s3://<bucket>/warehouse and restarted
From the call below I'm willing to bet webhdfs is barfing on the syntax. Any ideas?
2016-02-28 14:09:36,339 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X PUT --negotiate -u : '"'"'http://<server>:50070/webhdfs/v1s3:/<bucket>/warehouse?op=MKDIRS&user.name=hdfs'"'"' 1>/tmp/tmp_QkaO7 2>/tmp/tmpFSumMx''] {'logoutput': None, 'quiet': False} 2016-02-28 14:09:36,360 - call returned (0, '')
Created 02-28-2016 09:49 PM
Please refer to the following articles https://hadoop.apache.org/docs/r2.7.1/hadoop-aws/tools/hadoop-aws/index.htm
https://cwiki.apache.org/confluence/display/Hive/HiveAws+HivingS3nRemotely
Created 02-29-2016 12:20 AM
Your question: Does anyone know how to make hive default to S3 so each table does not need to be external? Is this possible?
Locally managed table to s3 ...See this
HDP in theory can be setup to use S3 as the default filesystem (instead of HDFS).
Detailed instructions on how to replace HDFS with S3 are given here.
http://wiki.apache.org/hadoop/AmazonS3
At a high level, we have to set the “fs.defaultFS” property has to be set to point to S3 in core-site.xml
The default setting for this property looks as below:
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoopNamenode:8020</value>
</property>
Change it the below setting:
<property>
<name>fs.defaultFS</name>
<value>s3://BUCKET</value>
</property>
In addition to setting the default FileSystem to be S3, we also have to provide the AWS access ID and AWS Secret Access Keys. Both these settings are shown below:
<property>
<name>fs.s3.awsAccessKeyId</name>
<value>ID</value>
</property>
<property>
<name>fs.s3.awsSecretAccessKey</name>
<value>SECRET</value>
</property>
Hive Tables in S3
A Hive table that uses “S3” as storage can be created as below:
CREATE TABLE SRC_TABLE
(
COL1 string ,
COL2 string ,
COL3 string
) ROW FORMAT DELIMITED
STORED AS TEXTFILE
LOCATION 's3://BUCKET_NAME/user/root/src_table'
;
The only difference here is that we specify the location of the table to be a sub-folder under “S3://BUCKET_NAME”.
Data can be loaded into this table using the hive command:
Hive: > load data local inpath “local_table.csv” into table SRC_TABLE;
The path “ s3://BUCKET_NAME/user/root/src_table” can be treated as any in HDFS and can be used with Hive/Pig/MapReduce etc.
Created 02-29-2016 12:22 AM
Hi @Matt Davies See this blog http://blog.sequenceiq.com/blog/2014/11/17/datalake-cloudbreak-2/
You can set "hive.metastore.warehouse.dir": "s3://siq-hadoop/apps/hive/warehouse",
Can you share more details from HS2 logs?