About JagdishKewat

JagdishKewat · ‎04-15-2015

Thanks Harsh. Actually I tried s3a however it is throwing filesystem exception as "java.io.IOException: No FileSystem for scheme: s3a" Looks like some jars conflict issue, though didn't get chance to look deep enough.

JagdishKewat · ‎04-14-2015

Alright ! I figured out the fix for this. The temp buffer directory for S3 is configurable ith the property "fs.s3.buffer.dir" in core-default.xml config file. The default config is as shown below. <property> <name>fs.s3.buffer.dir</name> <value>${hadoop.tmp.dir}/s3</value> <description>Determines where on the local filesystem the S3 filesystem should store files before sending them to S3 (or after retrieving them from S3). </description> </property> This doesn't require any services restart so is an easy fix.

JagdishKewat · ‎04-10-2015

Hi, I am using following command to transfer data from hdfs to s3. hadoop distcp -Dmapreduce.map.memory.mb=3096 -Dmapred.task.timeout=60000000 -i -log /tmp/export/logs hdfs:///test/data/export/file.avro s3n://ACCESS_ID:ACCESS_KEY@S3_BUCKET/ What I have noticed is mapper task which copies data to s3 first locally copies data into /tmp/hadoop-yarn/s3 directory on individual node. This is causing disk space issues on nodes since the transfer data size is in TBs. Is there a way to configure temporary working directory for mapper? Can it use hdfs disk space rather than local disk space? Thanks in advance. Jagdish

Online	Offline
Last Visited	‎04-24-2015 03:18 AM

Member Since	‎04-10-2015 12:22 AM
Last Visited	‎04-24-2015 03:18 AM
Posts	4
Kudos received	2

Cloudera Community

Re: disk space issue on nodes for distcp data tran...

Re: disk space issue on nodes for distcp data tran...

Re: disk space issue on nodes for distcp data tran...

disk space issue on nodes for distcp data transfer...