Member since 
    
	
		
		
		02-08-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                39
            
            
                Posts
            
        
                29
            
            
                Kudos Received
            
        
                5
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1964 | 06-22-2017 05:05 PM | |
| 2734 | 03-26-2017 11:55 PM | |
| 2892 | 07-18-2016 03:15 PM | |
| 19538 | 06-29-2016 07:43 PM | |
| 2047 | 06-20-2016 06:11 PM | 
			
    
	
		
		
		10-02-2017
	
		
		04:19 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 useful for maven based builds...thanks 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-22-2017
	
		
		06:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You probably are using Hive on Tez. There is user-level explain for Hive on Tez users. Apply below setting and then run 'explain' query to see much more clearly readable tree of operations. This is also available for Hive on Spark and setting is called 'hive.spark.explain.user'  set hive.explain.user=true 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		06-22-2017
	
		
		05:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 If you are doing this on single node in cluster then yes delete the original copied data files and Namenode will take care of recreating missing data files.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-27-2017
	
		
		02:25 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 One option is to delete existing external table and create new table that includes new column. Since this is Hive metadata operation, your data files wont be touched. Downside is that you will have to execute alter table command to redefine partitions on new table. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		03-26-2017
	
		
		11:55 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 In production type cluster (with 10's-100+ nodes) with Namenode HA enabled, best practice is to have 2 Namenodes (1 active and 1 standby) and 3 Journal nodes. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-17-2016
	
		
		04:31 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Spark Standalone mode is Spark’s own built-in clustered environment. Standalone-Master is the resource manager for the Spark Standalone cluster.Standalone-Worker is the worker in the Spark Standalone cluster.
To install Spark Standalone mode, you simply place a compiled version of Spark on each node on the cluster.You can launch standalone cluster either manually, by starting a master and workers by hand, or use launch scripts.  In most enterprises, you already have Hadoop cluster that is running YARN and want to leverage it for resource management instead of additionally running Spark Standalone mode. If using YARN, spark applications will run its spark-master and spark-workers within containers of YARN.  Irrespective of your deployment mode, Spark application will consume same resources it requires to process the data. In case of YARN you have to be aware of what other workloads will be running on cluster (like MR, Tez etc) at same time spark application is executing and size your machines accordingly. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-15-2016
	
		
		03:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Can you share full details of error message?You are probably missing some libraries. If you are reading data from hdfs and running in yarn-cluster mode your parallelism by default will be equal to number of hdfs blocks. As best practice you should avoid doing collect operation unless its small test dataset and instead use saveAsTextFile method to write result dataset to hdfs or local file. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-10-2016
	
		
		07:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 What version of spark and hdp? Can you list out all jar under SPARK_HOME directory from worker machine in cluster? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-08-2016
	
		
		04:33 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 In what mode are you running spark application? spark shell or yarn client etc? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-18-2016
	
		
		03:15 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 We explicitly listed out FQDN's of all hosts in both the clusters under [domain_realm] section of krb5.conf file. We have to update this file everytime we add node to our clusters and our clusters are currently less than <100 nodes and this solution is manageable but for large clusters this may be challenge. 
						
					
					... View more