- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 12-18-2015 08:16 PM
If you wish to reference a file in S3 from a pig script you might do something like this:
set fs.s3n.awsSecretAccessKey 'xxxxxxxxxxxxxxxxxxxxxxxxxxxxx'; set fs.s3n.awsAccessKeyId 'xxxxxxxxxxxxxxxxxxxxx'; A = load 's3n://<bucket>/<path-to-file>' USING TextLoader;
If you're on HDP 2.2.6, you'll likely see this error:
Error: java.io.Exception, no filesystem for scheme: s3n
The following steps resolve this issue:
In core-site.xml add:
<property> <name>fs.s3n.impl</name> <value>org.apache.hadoop.fs.s3native.NativeS3FileSystem</value> <description>The FileSystem for s3n: (Native S3) uris.</description> </property>
Then add to the MR2 and/or TEZ class path(s):
/usr/hdp/${hdp.version}/hadoop-mapreduce/*
These configs ensure 2 things:
- That the worker YARN containers spawned by pig have access to the hadoop-aws.jar file
- That the worker YARN containers know which class implements the file system type identified by "s3n://"
References:
Created on 11-08-2016 10:49 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
Spoke to the author.. This is still definitely relevant to HDP 2.2 and I think HDP 2.3.
Created on 12-13-2016 11:57 PM
- Mark as Read
- Mark as New
- Bookmark
- Permalink
- Report Inappropriate Content
s3n is deprecated in newer versions of Hadoop (see https://wiki.apache.org/hadoop/AmazonS3), so it's better to use s3a. To use s3a, specify s3a:// in front of the path when accessing files.
The following properties need to be configured first:
<property><name>fs.s3a.access.key</name><value>ACCESS-KEY</value></property><property><name>fs.s3a.secret.key</name><value>SECRET-KEY</value></property>