Member since 
    
	
		
		
		04-10-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                4
            
            
                Posts
            
        
                2
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 7645 | 04-14-2015 06:51 AM | 
			
    
	
		
		
		04-15-2015
	
		
		02:55 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							Thanks Harsh.  Actually I tried s3a however it is throwing filesystem exception as  "java.io.IOException: No FileSystem for scheme: s3a"  Looks like some jars conflict issue, though didn't get chance to look deep enough.
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-14-2015
	
		
		06:51 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Alright ! I figured out the fix for this.     The temp buffer directory for S3 is configurable ith the property "fs.s3.buffer.dir" in core-default.xml config file.     The default config is as shown below.     <property>  <name>fs.s3.buffer.dir</name>  <value>${hadoop.tmp.dir}/s3</value>  <description>Determines where on the local filesystem the S3 filesystem  should store files before sending them to S3  (or after retrieving them from S3).  </description>  </property>     This doesn't require any services restart so is an easy fix.     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-10-2015
	
		
		12:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi,       I am using following command to transfer data from hdfs to s3.      hadoop distcp -Dmapreduce.map.memory.mb=3096 -Dmapred.task.timeout=60000000 -i -log /tmp/export/logs  hdfs:///test/data/export/file.avro s3n://ACCESS_ID:ACCESS_KEY@S3_BUCKET/     What I have noticed is mapper task which copies data to s3 first locally copies data into /tmp/hadoop-yarn/s3 directory on individual node. This is causing disk space issues on nodes since the transfer data size is in TBs.      Is there a way to configure temporary working directory for mapper? Can it use hdfs disk space rather than local disk space?     Thanks in advance.  Jagdish 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
- 
						
							
		
			Apache YARN
- 
						
							
		
			HDFS
 
        


