Community Articles

Find and share helpful community-sourced technical articles.
avatar
Rising Star

If you wish to reference a file in S3 from a pig script you might do something like this:

set fs.s3n.awsSecretAccessKey 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx';
set fs.s3n.awsAccessKeyId 'xxxxxxxxxxxxxxxxxxxxx';
A = load 's3n://<bucket>/<path-to-file>' USING TextLoader;

If you're on HDP 2.2.6, you'll likely see this error:

Error: java.io.Exception, no filesystem for scheme: s3n

The following steps resolve this issue:

In core-site.xml add:

<property>
<name>fs.s3n.impl</name>
<value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value>
<description>The FileSystem for s3n: (Native S3) uris.</description>
</property>

Then add to the MR2 and/or TEZ class path(s):

/usr/hdp/${hdp.version}/hadoop-mapreduce/*

These configs ensure 2 things:

  1. That the worker YARN containers spawned by pig have access to the hadoop-aws.jar file
  2. That the worker YARN containers know which class implements the file system type identified by "s3n://"

References:

2,832 Views
Comments
avatar

Spoke to the author.. This is still definitely relevant to HDP 2.2 and I think HDP 2.3.

avatar

s3n is deprecated in newer versions of Hadoop (see https://wiki.apache.org/hadoop/AmazonS3), so it's better to use s3a. To use s3a, specify s3a:// in front of the path when accessing files.

The following properties need to be configured first:

<property><name>fs.s3a.access.key</name><value>ACCESS-KEY</value></property><property><name>fs.s3a.secret.key</name><value>SECRET-KEY</value></property>