Member since
04-10-2015
4
Posts
2
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6718 | 04-14-2015 06:51 AM |
04-15-2015
02:55 AM
Thanks Harsh. Actually I tried s3a however it is throwing filesystem exception as "java.io.IOException: No FileSystem for scheme: s3a" Looks like some jars conflict issue, though didn't get chance to look deep enough.
... View more
04-14-2015
06:51 AM
1 Kudo
Alright ! I figured out the fix for this. The temp buffer directory for S3 is configurable ith the property "fs.s3.buffer.dir" in core-default.xml config file. The default config is as shown below. <property> <name>fs.s3.buffer.dir</name> <value>${hadoop.tmp.dir}/s3</value> <description>Determines where on the local filesystem the S3 filesystem should store files before sending them to S3 (or after retrieving them from S3). </description> </property> This doesn't require any services restart so is an easy fix.
... View more
04-10-2015
12:27 AM
1 Kudo
Hi, I am using following command to transfer data from hdfs to s3. hadoop distcp -Dmapreduce.map.memory.mb=3096 -Dmapred.task.timeout=60000000 -i -log /tmp/export/logs hdfs:///test/data/export/file.avro s3n://ACCESS_ID:ACCESS_KEY@S3_BUCKET/ What I have noticed is mapper task which copies data to s3 first locally copies data into /tmp/hadoop-yarn/s3 directory on individual node. This is causing disk space issues on nodes since the transfer data size is in TBs. Is there a way to configure temporary working directory for mapper? Can it use hdfs disk space rather than local disk space? Thanks in advance. Jagdish
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN
-
HDFS