Member since 
    
	
		
		
		01-09-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                401
            
            
                Posts
            
        
                163
            
            
                Kudos Received
            
        
                80
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2594 | 06-21-2017 03:53 PM | |
| 4288 | 03-14-2017 01:24 PM | |
| 2388 | 01-25-2017 03:36 PM | |
| 3832 | 12-20-2016 06:19 PM | |
| 2098 | 12-14-2016 05:24 PM | 
			
    
	
		
		
		08-08-2016
	
		
		04:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 So, you have one dead node where 50010 is already taken up by some process, so datanode is not starting. It could be a case on datanode process not shutting down cleanly. You can get the process id from netstat and see if kill -9 clears that port.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-08-2016
	
		
		04:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 How big are your dimension tables? For best speed, some denormalization will help. However, with various improvements to hive and if your dimension tables are small enough for map join, you may not see a lot of difference between the two.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-03-2016
	
		
		02:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Best way to change dfs.datanode.data.dir is to decommission the datanode, change the value and add it back. This would avoid directly access files and moving them. You could move the contents from directory to another (without decommission) but not recommended.   For namenode metadata directories, you can change this value and restart standby NN first. Once Standby NN comes back (and it will start using new directory), then restart active NN. It picks the new directory on restart as well. Once both NNs are up, you can clear the contents of the older metadata directories.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-27-2016
	
		
		08:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 For smaller distcp jobs, I think setup time on dynamic strategy will be longer than for the uniform size strategy. And if all maps are running at similar speeds, then you won't gain much using dynamic strategy and lose the setup time.   However, not all maps run at similar speeds. With dynamic strategy, slower running maps will get to process less data and faster running maps process more data. I haven't got the exact amount of data where one works better than other, but in general on larger datasets and on heterogenous cluster (not all workers are same hardware), dynamic strategy has advantage.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-09-2016
	
		
		02:24 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @zblanco can knox sso use an existing kerberos ticket to authenticate?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-07-2016
	
		
		08:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks. If you already have a jira for this, please post so we can keep track.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-07-2016
	
		
		07:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Running into below error on a secure cluster with KMS configured. Surprising part is that this exception comes only for one user who is no different from other users not running into this. Any thoughts on when this error can occur ?    java.io.IOException: java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:892)
        at org.apache.hadoop.crypto.key.KeyProviderDelegationTokenExtension.addDelegationTokens(KeyProviderDelegationTokenExtension.java:86)
        at org.apache.hadoop.hdfs.DistributedFileSystem.addDelegationTokens(DistributedFileSystem.java:2291)
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:121)
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
        at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
        at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:206)
        at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
        at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
        at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
        at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:242)
        at org.apache.spark.rdd.RDD$anonfun$partitions$2.apply(RDD.scala:240)
        at scala.Option.getOrElse(Option.scala:120)
        at org.apache.spark.rdd.RDD.partitions(RDD.scala:240)
        at org.apache.spark.api.java.JavaRDDLike$class.partitions(JavaRDDLike.scala:64)
        at org.apache.spark.api.java.AbstractJavaRDDLike.partitions(JavaRDDLike.scala:46)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
        at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:381)
        at py4j.Gateway.invoke(Gateway.java:259)
        at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
        at py4j.commands.CallCommand.execute(CallCommand.java:79)
        at py4j.GatewayConnection.run(GatewayConnection.java:209)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.UndeclaredThrowableException
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1727)
        at org.apache.hadoop.crypto.key.kms.KMSClientProvider.addDelegationTokens(KMSClientProvider.java:874)
        ... 30 more
Caused by: org.apache.hadoop.security.authentication.client.AuthenticationException: Authentication failed, status: 400, message: Bad Request
        at org.apache.hadoop.security.authentication.client.AuthenticatedURL.extractToken(AuthenticatedURL.java:274)
        at org.apache.hadoop.security.authentication.client.PseudoAuthenticator.authenticate(PseudoAuthenticator.java:77)
        at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
        at org.apache.hadoop.security.authentication.client.KerberosAuthenticator.authenticate(KerberosAuthenticator.java:214)
        at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.authenticate(DelegationTokenAuthenticator.java:128)
        at org.apache.hadoop.security.authentication.client.AuthenticatedURL.openConnection(AuthenticatedURL.java:215)
        at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.doDelegationTokenOperation(DelegationTokenAuthenticator.java:285)
        at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticator.getDelegationToken(DelegationTokenAuthenticator.java:166)
        at org.apache.hadoop.security.token.delegation.web.DelegationTokenAuthenticatedURL.getDelegationToken(DelegationTokenAuthenticatedURL.java:371)
        at org.apache.hadoop.crypto.key.kms.KMSClientProvider$2.run(KMSClientProvider.java:879)
        at org.apache.hadoop.crypto.key.kms.KMSClientProvider$2.run(KMSClientProvider.java:874)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
        ... 31 more 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		07-07-2016
	
		
		07:18 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Working with AD integration with LDAP and wondering if Ambari can use existing kerberos ticket instead of an explicit login. This is an AD where users don't have passwords but hardware key based passcodes. Once they login to their system, they have a valid kerberos ticket from AD. Can this be used instead of asking user to login?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Ambari
			
    
	
		
		
		07-07-2016
	
		
		04:24 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I am not sure what they mean by ORC not being a general purpose format. Anyway, in this case, you are still going through HCatalog (there are HCatalog APIs for MR and Pig).   When I said you can transform this data as necessary, I mean things like creating new Partitions, Buckets, Sorting, Bloom filters and even redesigning tables for better access.   There will be data duplication with any data transforms if you want to keep raw data as well. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-07-2016
	
		
		12:42 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Avoid using root user to run jobs in kerberized cluster. There is a configuration to set minimum uid and unless you have specific requirement to run as root, don't use it. The parameter is min.user.id and default is 1000. Linux super users generally end up below 1000 uids.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        













