Member since 
    
	
		
		
		04-20-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                86
            
            
                Posts
            
        
                27
            
            
                Kudos Received
            
        
                7
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3737 | 03-13-2017 04:06 AM | |
| 5327 | 03-09-2017 01:55 PM | |
| 2243 | 01-05-2017 02:13 PM | |
| 8576 | 12-29-2016 05:43 PM | |
| 7397 | 12-28-2016 11:03 PM | 
			
    
	
		
		
		10-17-2018
	
		
		12:11 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Great article !!! 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-17-2018
	
		
		12:11 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @PJ  Since you have ranger enabled, its possible that your permission is denied at Ranger end. I would definitely check the Ranger Audit logs for any events for the users and see if we are hitting the permission denied in there.   Also I would add a ranger hdfs policy to allow user user1 write access to /user/user1/sparkeventlogs once I validate it was Ranger who was blocking the permissions. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-17-2018
	
		
		12:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Dukool SHarma   
	Safe mode is a NameNode state in which the node doesn’t accept any changes to the HDFS namespace, meaning HDFS will be in a read-only state. Safe mode is entered automatically at NameNode startup, and the NameNode leaves safe mode automatically when the configured minimum percentage of blocks satisfies the minimum replication condition.   
	When you start up the NameNode, it doesn’t start replicating data to the DataNodes right away. The NameNode first automatically enters a special read-only state of operation called safe mode. In this mode, the NameNode doesn’t honor any requests to make changes to its namespace. Thus, it refrains from replicating, or even deleting, any data blocks until it leaves the safe mode.   
	The DataNodes continuously send two things to the NameNode—a heartbeat indicating they’re alive and well and a block report listing
all data blocks being stored on a DataNode. Hadoop considers a data block “safely” replicated once the NameNode receives enough block reports from the DataNodes indicating they have a minimum number of replicas of that block. Hadoop makes the NameNode wait for the DataNodes to report blocks so it doesn’t start replicating data prematurely by attempting to replicate data even when the correct
number of replicas exists on DataNodes that haven’t yet reported their block information.  
	  
When a preconfigured percentage of blocks are reported as safely replicated, the NameNode leaves the safe mode and starts serving block information to clients. It’ll also start replicating all blocks that the DataNodes have reported as being under replicated.  
	Use the dfsadmin –safemode command to manage safe mode operations for the NameNode. You can check the current safe mode status with the -safemode get command:  $ hdfs dfsadmin -safemode get 
Safe mode is OFF in hadoop01.localhost/10.192.2.21:8020
Safe mode is OFF in hadoop02.localhost/10.192.2.22:8020
$  You can place the NameNode in safe mode with the -safemode enter command:  $ hdfs dfsadmin -safemode enter
Safe mode is ON in hadoop01.localhost/10.192.2.21:8020 
Safe mode is ON in hadoop02.localhost/10.192.2.22:8020 
$  Finally, you can take the NameNode out of safemode with the –safemode leave command:   $ hdfs dfsadmin -safemode leave
Safe mode is OFF in hadoop01.localhost/10.192.2.21:8020
Safe mode is OFF in hadoop02.localhost/10.192.2.22:8020
$ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-10-2017
	
		
		12:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Are we closing the spark context here ? Usually a ".close()" call is done, the JVM should be able to clean up those directories .  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-04-2017
	
		
		02:47 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Nikhil Pawar   One thing you could do here is to increase the "spark.executor.heartbeatInterval" which is default set to 10secs to something higher and test it out . Also something to look  at would be to review the executor logs to see if you have any OOM / GC issues when the executors are running on the jobs that you kick off from spark. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-14-2017
	
		
		01:40 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Jeff Watson  Can you give us the command for the spark-submit and also attach the console o/p in here for us to check ? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2017
	
		
		06:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Can you check what is the "io.file.buffer.size" is set to here? You may need to tweak it to set this to below what the "MAX_PACKET_SIZE" is set to .  Referencing a great blog post here (http://johnjianfang.blogspot.com/2014/10/hadoop-two-file-buffer-size.html)  For example, take a look at the BlockSender in HDFS.class BlockSender implements java.io.Closeable {
  /**
   * Minimum buffer used while sending data to clients. Used only if
   * transferTo() is enabled. 64KB is not that large. It could be larger, but
   * not sure if there will be much more improvement.
   */
  private static final int MIN_BUFFER_WITH_TRANSFERTO = 64*1024;
  private static final int TRANSFERTO_BUFFER_SIZE = Math.max(
      HdfsConstants.IO_FILE_BUFFER_SIZE, MIN_BUFFER_WITH_TRANSFERTO);
}
The BlockSender uses "io.file.buffer.size" as the transfer buffer size. If this parameter is not defined, the default buffer size 64KB is used. The above explains why most hadoop IOs were either 4K or 64K chunks in my friend's cluster since he did not tune the cluster. To achieve a better performance, we should tune "io.file.buffer.size" to a much bigger value, for example, up to 16MB. The upper limit is set by the MAX_PACKET_SIZEin org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2017
	
		
		06:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Try running it in the debug mode and then provide the o/p here. For hivecli you could do as below:  hive --hiveconf hive.root.logger=DEBUG,console  Once done , re-run the query and see where it fails. That should give you better insight on the failure here 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2017
	
		
		06:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Saikiran Parepally  Please accept the answer if that has helped to resolve the issue 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-13-2017
	
		
		04:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Saikiran Parepally   Did that fix the issue here? 
						
					
					... View more