Member since 
    
	
		
		
		01-09-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                401
            
            
                Posts
            
        
                163
            
            
                Kudos Received
            
        
                80
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2596 | 06-21-2017 03:53 PM | |
| 4294 | 03-14-2017 01:24 PM | |
| 2388 | 01-25-2017 03:36 PM | |
| 3840 | 12-20-2016 06:19 PM | |
| 2101 | 12-14-2016 05:24 PM | 
			
    
	
		
		
		05-30-2016
	
		
		07:16 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Please take a look at this post https://community.hortonworks.com/questions/2349/tip-when-you-get-a-message-in-job-log-user-dr-who.html  There are different ways to fix this issue, one of which is to put hadoop.http.staticuser.user=yarn in core-site.xml. More details in the linked thread.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-30-2016
	
		
		07:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 It looks like you have issues with your Resource Manager1 and your app is trying to switch over to Rresource Manager2. You can take a look at RM1 logs to see what the error is. It is either down or not responding on 8032 on time.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-30-2016
	
		
		07:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Check if you can passwordless ssh to the same host (hadoop1) using the key. I believe you set this from hadoop1 to connect to hadoop2, hadoop3 and hadoop4 but not hadoop1 itself.   If this does not work, you can try manual registration of ambari using http://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_ambari_reference_guide/content/ch_amb_ref_installing_ambari_agents_manually.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-29-2016
	
		
		07:54 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 1) You can use base permissions on HDFS and give any additional permissions using ranger. So, in case of /data, you can start with 750, and if anyone in the group needs write permission, you can add it using a ranger policy.   3) User will have access. As I said in 1, you can put minimum permissions in HDFS and add additional permissions using ranger.   4) You can still access this directly if Hive has doAs and you are accessing from hiveserver2. This is the reason why you may have to duplicate access restrictions both on HDFS and hive columns if you have access from Hive CLI and Hiveserver2. Almost similar case with hbase.   5) As in 1, you can put minimal permission on HDFS and then add additional permissions using ranger. Which means, you could go with 700 too, but that will add more overhead on creating policies.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-29-2016
	
		
		07:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Please post the errors that you are seeing, both on ambari UI and ambari-server.log. ambari-agent logs also might provide some insight into what is going on.   Have you also executed   'ambari-server reset'  before reinstall?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-27-2016
	
		
		04:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Whenever you are trying to evaluate Hive on Tez vs any other tool and this is for data analytics (with row level update/access patterns), my suggestion is to start with hive, use all the right tunings at OS, Cluster and Hive level, use ORC, bloom filters, organize your data and see if query times hit your SLA.   We have a seen at a lot of places that once they tune hive correctly and move away from text files they will hit SLAs. You can then look at other tools if and when your SLAs are not met.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-26-2016
	
		
		04:26 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Yes. Thats correct.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-26-2016
	
		
		03:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Try setting on SparkContext like below. This works for file loads, and I believe should work for hive table load as well  sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.input.dir.recursive","true") 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-26-2016
	
		
		02:46 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Teeing vs Copying- Which one is preferred over the other? Understand its scenario dependent. But which has better adaptability and more widely used in the industry? Copying?  
With Teeing, you can split up primary tasks between the 2 clusters and use the other cluster as DR for that task. As an example, if you have clusters C1 and C2, you can use C1 as primary cluster and C2 as DR for some teams/tasks and use C2 as primary cluster and C1 as DR for some other users/tasks  
Is it necessary to have both the main and the DR cluster on the same version of HDP? If not, what are things to consider if same version is not possible?  
It is convinent to have them both on same version. This is especially the case if you want to use DR with almost no code changes if primary server is down.   
Should it be like for like topology between clusters in terms of component placement including gateway nodes and zookeeper services?   This is not required.   
How does security play out for DR? Should both the cluster nodes be part of the same Kerberos realm or can they be part of different realms?  
As a DR, same realm is a lot easier to manage than cross realm. But cross realm is possible.   
Can the replication factor be lower? Or it recommended to maintain it as the same as the primary cluster?   I have seen using rep factor 2 on DR clusters, but in case this turns in primary after disaster you may have to change rep factor to 3 on all data sets.   
Any specific network requirements in terms of latency, speed etc. between the clusters
For ditscp, each node one cluster should communicate with each of the other nodes on second cluster. 
Is there a need to run balancer on the DR cluster periodically?  
Yes. Always good to run balancer to keep similar number of blocks across nodes.   
How does encryption play out between the primary and DR clusters? If encryption at rest is enabled in the primary one, how is it handled in the DR cluster? What are the implications of wire-encryption while transferring the data between the clusters?   Wire encyprtion will slow down transfers a little bit.  
When HDFS snapshots is enabled on the primary cluster, how does it work when data is being synced to the DR cluster? Can Snapshots be exported onto another cluster? I understand this is possible for HBase snapshots. But is it allowed in HDFS case? For example, if a file is deleted on the primary cluster, but available in the snapshot, will that be synced to the snapshot directory on the DR cluster?   If you are using snapshots, you can simply use distcp on snapshots instead of actual data set.   
For services which involve databases (Hive, Oozie, Ambari), instead of backing up periodically from the primary cluster to the DR cluster, is it recommended to setup one HA master in the DR cluster directly?   I don't think automating ambari is a good idea. Configs don't change that much so a simple process of duplicating might be better. Backing up would mean you need to have same hostnames and same topology.
For hive, instead of complete backup, Falcon can take care of table level replication.   
For configurations and application data, instead of backing up at regular intervals, is there a way to keep them in sync between the primary and DR clusters?  
Not sure where your application data resides, but for configuration since everything is managed by ambari, you can need to keep ambari configuration in sync. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-26-2016
	
		
		02:01 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Take a look at https://community.hortonworks.com/articles/25523/hdp-240-and-spark-160-connecting-to-aws-s3-buckets.html which gives details on how to access S3 from spark.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













