Support Questions

matt_davies · ‎02-28-2016

Does anyone know how to make hive default to S3 so each table does not need to be external? Is this possible?

There are articles such as http://blog.sequenceiq.com/blog/2014/11/17/datalake-cloudbreak-2/ which indicate this is possible, but when one does this with the HDP 2.3 it appears the HiveServer2 fails when trying to access the webhdfs location including s3.

I set the hive.metastore.warehouse.dir=s3://<bucket>/warehouse and restarted

From the call below I'm willing to bet webhdfs is barfing on the syntax. Any ideas?

2016-02-28 14:09:36,339 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl -sS -L -w '"'"'%{http_code}'"'"' -X PUT --negotiate -u : '"'"'http://<server>:50070/webhdfs/v1s3:/<bucket>/warehouse?op=MKDIRS&user.name=hdfs'"'"' 1>/tmp/tmp_QkaO7 2>/tmp/tmpFSumMx''] {'logoutput': None, 'quiet': False}
2016-02-28 14:09:36,360 - call returned (0, '')

aervits · ‎02-28-2016

Please refer to the following articles https://hadoop.apache.org/docs/r2.7.1/hadoop-aws/tools/hadoop-aws/index.htm

https://cwiki.apache.org/confluence/display/Hive/HiveAws+HivingS3nRemotely

nsabharwal · ‎02-29-2016

@Matt Davies

Your question: Does anyone know how to make hive default to S3 so each table does not need to be external? Is this possible?

Locally managed table to s3 ...See this

Using S3 as the default FS

HDP in theory can be setup to use S3 as the default filesystem (instead of HDFS).

Detailed instructions on how to replace HDFS with S3 are given here.

http://wiki.apache.org/hadoop/AmazonS3

At a high level, we have to set the “fs.defaultFS” property has to be set to point to S3 in core-site.xml

The default setting for this property looks as below:

<property>

<name>fs.defaultFS</name>

<value>hdfs://hadoopNamenode:8020</value>

</property>

Change it the below setting:

<property>

<name>fs.defaultFS</name>

<value>s3://BUCKET</value>

</property>

In addition to setting the default FileSystem to be S3, we also have to provide the AWS access ID and AWS Secret Access Keys. Both these settings are shown below:

<property>

<name>fs.s3.awsAccessKeyId</name>

<value>ID</value>

</property>

<property>

<name>fs.s3.awsSecretAccessKey</name>

<value>SECRET</value>

</property>

Hive Tables in S3

A Hive table that uses “S3” as storage can be created as below:

CREATE TABLE SRC_TABLE

(

COL1 string ,

COL2 string ,

COL3 string

) ROW FORMAT DELIMITED

STORED AS TEXTFILE

LOCATION 's3://BUCKET_NAME/user/root/src_table'

;

The only difference here is that we specify the location of the table to be a sub-folder under “S3://BUCKET_NAME”.

Data can be loaded into this table using the hive command:

Hive: > load data local inpath “local_table.csv” into table SRC_TABLE;

The path “ s3://BUCKET_NAME/user/root/src_table” can be treated as any in HDFS and can be used with Hive/Pig/MapReduce etc.

nsabharwal · ‎02-29-2016

Hi @Matt Davies See this blog http://blog.sequenceiq.com/blog/2014/11/17/datalake-cloudbreak-2/

You can set "hive.metastore.warehouse.dir": "s3://siq-hadoop/apps/hive/warehouse",

Can you share more details from HS2 logs?

Cloudera Community

Support Questions

Making hive default to s3

Using S3 as the default FS

Make mysql database as Hive's Metastore:

Change default permission of hive database

Hive Table Default Format

Using S3 as DefaultFs

How to access data files stored in AWS S3 buckets ...

How to use Templating to make Data Engineering wo...

Hive Metastore not working - Syntax error 'OPTION ...

UMask vs HDFS default ACLs

ImpalaRuntimeException: Error making 'createTable'...

Change default Hive compression codec