Member since 
    
	
		
		
		02-01-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                650
            
            
                Posts
            
        
                143
            
            
                Kudos Received
            
        
                117
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3502 | 04-01-2019 09:53 AM | |
| 1812 | 04-01-2019 09:34 AM | |
| 8901 | 01-28-2019 03:50 PM | |
| 1971 | 11-08-2018 09:26 AM | |
| 4468 | 11-08-2018 08:55 AM | 
			
    
	
		
		
		11-26-2016
	
		
		06:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Marco Chou: Use the ip allocated to the box (ip from ifconfig) instead of using 127.0.0.1 to connect through browser. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-25-2016
	
		
		08:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							@Oliver Meyn This is the correct jira https://issues.apache.org/jira/browse/SPARK-12177 and yes, SASL_SSL is only available from Spark 2.0 and not in HDP2.4.2 which has Spark 1.6.1. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-22-2016
	
		
		05:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							@Fernando Lopez Bello You need to have a hivecontext to access hive tables.  from pyspark.sql import HiveContext 
sqlCtx = HiveContext(sc1)
 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-20-2016
	
		
		02:34 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							@Jayanna TM You can use ignorePattern property to ignore .tmp files from a spool directory (https://flume.apache.org/FlumeUserGuide.html)  ignorePattern=\.*tmp$ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-25-2016
	
		
		11:54 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @bigdata.neophyte   You would need to use this API to fetch the job status.(https://hadoop.apache.org/docs/r2.4.1/api/org/apache/hadoop/mapreduce/JobStatus.html)  If you want a simple solution you could try something like:   1) Set unique job name (eg:date or time) using -Dmapred.job.name=testdist01  2) Get the app status using :   yarn application -list -appStates ALL,NEW,NEW_SAVING,SUBMITTED,ACCEPTED,RUNNING,FINISHED,FAILED,KILLED | grep -i "distcp: testdist01" | awk '{print $7,$8}'
FINISHED SUCCEEDED 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-27-2016
	
		
		09:35 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Mourad Chahri Can you check if you have enough disk available on the node ?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-27-2016
	
		
		09:26 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Muthukumar S : You need to either add the aws keys in the hadoop command or permanently add them in core-site.xml.   Are you able to do a hadoop fs -ls s3a://${BUCKET_NAME}/  [feel free to add keys accordingly] (This is to isolate authentication and connectivity issue)? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-21-2016
	
		
		03:37 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Mats Johansson By spark on R, I mean Running Spark on R server.   Which is the recommended one ? Spark on R vs SparkR ? Would also like to know the performance between both of them.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-29-2016
	
		
		01:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Ryan Spring
  Please use the dependencies mentioned here : https://community.hortonworks.com/questions/27966/kafkaspout-fails-with-zookeeper-socket-issues-in-k.html  securityProtocol property is available in hortonworks repo. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-26-2016
	
		
		09:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Roberto Sancho  : From the trace it looks like connection is timing out. Can you check ?  Server access error at url https://repo1.maven.org/maven2/com/databricks/spark-csv_2.10/1.3.0/spark-csv_2.10-1.3.0.jar (java.net.ConnectException: Connection timed out 
						
					
					... View more