Member since
09-15-2015
294
Posts
764
Kudos Received
81
Solutions
03-07-2017
02:22 AM
10 Kudos
Prerequisite:
Create an Account in S3 and get the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY AWS Command Line:
For the AWS command line to work have AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY configured in ~/.aws/credentials. Something like:
[default] aws_access_key_id=$AWS_ACCESS_KEY_ID aws_secret_access_key=$AWS_SECRET_ACCESS_KEY You might also want to set the region and output in ~/.aws/config. Something like:
[default] region=us-west-2 output=json
Steps:
Create a bucket in S3. You can create it online on Amazon Console ( CreatingABucket.html ) or using the command line like: aws s3 mb $BUCKET_NAME Modify the below properties in core-site.xml:
fs.defaultFS to s3a://$BUCKET_NAME fs.s3a.access.key to $AWS_ACCESS_KEY_ID fs.s3a.secret.key to $AWS_SECRET_ACCESS_KEY fs.AbstractFileSystem.s3a.imp to org.apache.hadoop.fs.s3a.S3A (HADOOP-11262) You might also want to set the below property in tez-site.xml if you need to run some Example jobs:
tez.staging-dir to hdfs://$NN_HOST:8020/tmp/$user_name/staging (TEZ-3276) hive.exec.scratchdir to hdfs://$NN_HOST:8020/tmp/hive (For running Hive on Tez) Restart HDFS,YARN, MAPREDUCE2 You should now be able to use S3 as the Default FileSystem.
... View more
Labels: