Member since 
    
	
		
		
		03-23-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                21
            
            
                Posts
            
        
                5
            
            
                Kudos Received
            
        
                1
            
            
                Solution
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 6876 | 08-12-2016 12:05 PM | 
			
    
	
		
		
		04-06-2017
	
		
		12:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 This problem has been happening on our side since many months as well. Both with Spark1 and Spark2. Both while running jobs in the shell as well as in Python notebooks. And it is very easy to reproduce. Just open a notebook and let it run for a couple of hours. Or just do some simple dataframe operations in an infinite loop.  There seems to be something fundamentally wrong with the timeout configurations in the core of Spark. We will open a case for that as no matter what kind of configurations we have tried, the problem insists.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-12-2016
	
		
		12:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 I found the cause of the problem. it's configuration matter.  in fact namenode was installed on master01 but following parameter was set with worker02 (on which no namenode) :  dfs.namenode.http-address: worker02.cl02.sr.private:50070 instead of master01.cl02.sr.private:50070  the configuration was altered because the cluster was taken to HA configuration then taken back to non HA. then one of the namenodes was deleted (the one on worker02) without paying attention that the remaining configuration was pointing to worker02.  hope I'm clear 🙂 
						
					
					... View more