Community Articles

ewalk · ‎12-18-2015

If you wish to reference a file in S3 from a pig script you might do something like this:

set fs.s3n.awsSecretAccessKey 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
set fs.s3n.awsAccessKeyId 'xxxxxxxxxxxxxxxxxxxxx';
A = load 's3n://<bucket>/<path-to-file>' USING TextLoader;

If you're on HDP 2.2.6, you'll likely see this error:

Error: java.io.Exception, no filesystem for scheme: s3n

The following steps resolve this issue:

In core-site.xml add:

<property>
<name>fs.s3n.impl</name>
<value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
<description>The FileSystem for s3n: (Native S3) uris.</description>
</property>

Then add to the MR2 and/or TEZ class path(s):

/usr/hdp/${hdp.version}/hadoop-mapreduce/*

These configs ensure 2 things:

That the worker YARN containers spawned by pig have access to the hadoop-aws.jar file
That the worker YARN containers know which class implements the file system type identified by "s3n://"

References:

mherring · ‎11-08-2016

Spoke to the author.. This is still definitely relevant to HDP 2.2 and I think HDP 2.3.

Dominika · ‎12-13-2016

s3n is deprecated in newer versions of Hadoop (see https://wiki.apache.org/hadoop/AmazonS3), so it's better to use s3a. To use s3a, specify s3a:// in front of the path when accessing files.

The following properties need to be configured first:

<property><name>fs.s3a.access.key</name><value>ACCESS-KEY</value></property><property><name>fs.s3a.secret.key</name><value>SECRET-KEY</value></property>

Cloudera Community

Community Articles

HDP 2.2 Configuration required for S3

Apache Pig

Hortonworks Data Platform (HDP)

Re: HDP 2.2 Configuration required for S3

Re: HDP 2.2 Configuration required for S3

Comparing Performance of Cloudera Operational Data...

ORC Improvements for Apache Spark 2.2

HTTPFS - Configure and Run with HDP

Set dfs.namenode.accesstime.precision from Ambari ...

HDP 2.6 requires libtirpc-devel

Configure Anaconda with Zeppelin and HDP 3.0.1

Configure HDP Search Solr ranger plugin

Install and configure GeoMesa Hbase and GeoServer ...

Configuring HDP Security with Active Directory/IPA

Using S3 as DefaultFs