Member since 
    
	
		
		
		09-24-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                178
            
            
                Posts
            
        
                113
            
            
                Kudos Received
            
        
                28
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 4654 | 05-25-2016 02:39 AM | |
| 4591 | 05-03-2016 01:27 PM | |
| 1197 | 04-26-2016 07:59 PM | |
| 16802 | 03-24-2016 04:10 PM | |
| 3156 | 02-02-2016 11:50 PM | 
			
    
	
		
		
		12-04-2015
	
		
		01:09 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Here is how I would do it but I am missing any requirement, please feel free to add more details (without revealing any secret sauce of your logic 😉 -  Assumptions first -   
 Job 1 - Executes Step A and Step B at 00:01 AM every morning (two step job) 
  Job 2 - Executes Step B every hour between 01:01 - 23:01 through out the day. (single step job)   Note: The timings can obviously be adjusted but assumption here is that the time of execution of two step is fixed and is mutually exclusive with the other 23 executions of the single step job. These two steps could be any action supported by Oozie like Hive, Pig, Email, SSH etc. So the workflow definitions will have duplicate Step B action in both jobs.   Coordinator Definitions - The exact time of execution and frequency can be controlled by specifying the values of validity and frequency.   For Job1,   
 Validity = 00:00 hours of the day when you want the job to start executing. 
  Frequency = ${coord:days(int n)}   See section 4.4.1. The coord:days(int n) and coord:endOfDays(int n) EL functionsat - http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html  For Job 2,  
 Validity = 01:00 hours of the same day as Job 1
  Frequency = frequency="* 1-23 * * *"  Note: that instead of using fixed frequency we are using cron type syntax, which is super cool   See section 4.4.3. Cron syntax in coordinator frequency at - http://oozie.apache.org/docs/4.2.0/CoordinatorFunctionalSpec.html  Hope this helps.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-03-2015
	
		
		09:45 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Without spending too much time, it appears to be a defect to me.   @Balu any thoughts?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-03-2015
	
		
		09:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Hi Ravi, As you know that property dfs.datanode.handler.count defines the number of server threads for the datanode, this property is at the datanode level. In other words, this property value is driven more by the I/O requests to the datanode rather than the size of the cluster.   So, hypothetically speaking, if you have a cluster (large or small) being used for online archiving use case such that the data is not read very often, you do not need a large number of parallel threads. As the traffic / I/O goes up, there may be benefit in increasing the number of parallel threads in datanode. Here is the code that uses this property.    If there is a way to isolate the heavy workers from light workers then you can create Ambari configuration groups to have different values for these properties.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-03-2015
	
		
		08:57 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Moved the question to "Governance and Lifecycle" track.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-03-2015
	
		
		08:53 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 
	Ambari can only manage the principals and keytabs for the services managed by it. The pricipal and keytabs are actually provided as part of the configuration files with the stack definition.   
	For e.g. for Storm, looking at the stack , you can see -   ....
          "name": "storm_components",
          "principal": {
            "value": "${storm-env/storm_user}-${cluster_name}@${realm}",
            "type": "user",
            "configuration": "storm-env/storm_principal_name"
          },
          "keytab": {
            "file": "${keytab_dir}/storm.headless.keytab",
            "owner": {
              "name": "${storm-env/storm_user}",
              "access": "r"
            },
....  Ambari does not support managing principals and keytabs of other components that are outside its purview.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-03-2015
	
		
		08:10 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 There is no clean way to do this within the same oozie job.   If the time, when Step A and B have to be executed together, if fixed then IMHO it would be a better approach to set up two different oozie jobs - 1 with both steps that runs once a day and the other one with Step B only that runs 23 times.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-02-2015
	
		
		09:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I've seen very little ext3 and mostly ext4 for the on prem deployments.  AWS EBS is xfs by default. XFS has its advantages but in a JBOD setup, it doesn't really provide lot of benefits. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		12-01-2015
	
		
		01:12 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 This post by Lester Martin sums up really well - https://martin.atlassian.net/wiki/pages/viewpage.action?pageId=36044812  Here is a summary -    Since most OS patching / upgrade requires reboot, it is best to schedule such an activity around a scheduled outage.   It is also recommended to go through the exercise in a lower level environment prior to applying changes in a PROD environment.   In order to be apply the changes while the cluster is up, the patch/upgrade will have to be applied in rolling manner by first stopping the components in the host from Ambari, then applying changes, rebooting host and then starting the Hadoop services from Ambari. Repeat for each host.    This process will have to be scripted for a large size cluster. The steps of stopping and starting the cluster can be performed by uzing Ambari APIs. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-18-2015
	
		
		05:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 You mean DDL? Yeah, agreed. But OP is asking "load it into an existing Hive table" - so just insert.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-18-2015
	
		
		05:44 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Great, looking forward to hearing the results.  
						
					
					... View more