Member since 
    
	
		
		
		09-29-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                123
            
            
                Posts
            
        
                216
            
            
                Kudos Received
            
        
                47
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 10310 | 06-23-2016 06:29 PM | |
| 4037 | 06-22-2016 09:16 PM | |
| 7319 | 06-17-2016 06:07 PM | |
| 3950 | 06-16-2016 08:27 PM | |
| 9761 | 06-15-2016 06:44 PM | 
			
    
	
		
		
		11-18-2015
	
		
		10:25 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		4 Kudos
		
	
				
		
	
		
					
							 I recommend not setting this in core-site.xml, and instead setting it on the command line invocation specifically for the DistCp command that needs to communicate with the unsecured cluster.  Setting it in core-site.xml means that all RPC connections for any application are eligible for fallback to simple authentication.  This potentially expands the attack surface for man-in-the-middle attacks.  Here is an example of overriding the setting on the command line while running DistCp:  hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo  The command must be run while logged into the secured cluster, not the unsecured cluster. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-18-2015
	
		
		10:07 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		6 Kudos
		
	
				
		
	
		
					
							 If there is no existing documentation covering the JournalNode, then here is my recommendation.   Bootstrap the new server by copying the contents of dfs.journalnode.edits.dir from an existing JournalNode.  Start JournalNode on the new server.  Reconfigure the NameNodes to include the new server in dfs.namenode.shared.edits.dir.  Restart standby NN and verify it remains healthy.  Restart active NN and verify it remains healthy.  Reconfigure the NameNodes to remove the old server from dfs.namenode.shared.edits.dir.  Restart standby NN and verify it remains healthy.  Restart active NN and verify it remains healthy.   Some might note that during the copy in step 1, it's possible that additional transactions are being logged concurrently, so the copy might be out-of-date immediately.  This is not a problem though.  The JournalNode is capable of "catching up" by synchronizing data from other running JournalNodes.  In fact, step 1 is really just an optimization of this "catching up". 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-18-2015
	
		
		06:27 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 ipc.server.tcpnodelay controls use of Nagle's algorithm on any server component that makes use of Hadoop's common RPC framework.  That means that full deployment of a change in this setting would require a restart of any component that uses that common RPC framework.  That's a broad set of components, including all HDFS, YARN and MapReduce daemons.  It probably also includes other components in the wider ecosystem. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-18-2015
	
		
		06:20 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 There is currently no way to define a replication factor on a directory and have it cascade down automatically to all child files.  Instead of running the daemon process to change replication factor, do you have the option of setting the replication factor explicitly when you create the file?  For example, here is how you can override it while saving a file through the CLI.  > hdfs dfs -D dfs.replication=2 -put hello /hello
> hdfs dfs -stat 'name=%n repl=%r' /hello
name=hello repl=2  If your use case is something like a MapReduce job, then you can override dfs.replication at job submission time too.  Creating the file with the desired replication in the first place has an advantage over creating the file with replication factor 3 and then retroactively changing it to 2.  Creating it with replication factor 3 temporarily wastes disk space.  Changing it to replication factor 2 then creates extra work for the cluster to detect that some blocks are over-replicated, and replicas need to be deleted. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-30-2015
	
		
		09:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Andrew Grande, thank you.  I hadn't considered the IT challenges from the browser side. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-30-2015
	
		
		09:27 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Neeraj, thanks for the reply.  In this kind of compliance environment, is there something more that is done to mitigate the lack of authentication on the HTTP servers?  Are the HTTP ports firewalled off? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-29-2015
	
		
		09:03 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 The ACLs specified in the hadoop-policy.xml file refer to Hadoop service-level authorization.  http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ServiceLevelAuth.html  These ACLs are enforced on Hadoop RPC service calls.  These ACLs are not applicable to access through WebHDFS.  In order to fully control authorization to HDFS files, use HDFS permissions and ACLs.  http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html  Permissions and ACLs applied to directories and files are enforced for all means of access to the file system.  Other potential solutions are to use Knox or Ranger. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-29-2015
	
		
		05:24 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Activating Hadoop secure mode using Kerberos and activating Hadoop HTTP authentication using SPNEGO are separate configuration steps.  https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html  https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html  This means that it's possible to run a cluster with Kerberos authentication, but leave the HTTP endpoints unauthenticated.  Is there any valid use case for running in this configuration?  Enabling Kerberos authentication implies a desired for security hardening.  Therefore, leaving the HTTP endpoints unauthenticated seems undesirable.  I have encountered clusters that had enabled Kerberos but had not enabled HTTP authentication.  When I see this, I generally advise that the admins go back and configure HTTP authentication.  Am I missing a valid reason why an admin would want to keep running in this mode? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Hadoop
 - 
						
							
		
			Kerberos
 - 
						
							
		
			Security
 
			
    
	
		
		
		10-29-2015
	
		
		05:13 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 There is currently no way for a newly created directory in HDFS to set its group from the primary group of the creating user automatically.  Instead, it always follows the rule quoted in the question: the group is the group of the parent directory.  One way I've handled this in the past is first to create an intermediate directory and then explicitly change its group to the user's primary group, using chmod on the shell or setOwner in the Java APIs.  Then, additional files and directories created by the process would use this as the destination directory.  For example, a MapReduce job could specify its output directory under this intermediate directory, and then the output files created by that MapReduce job would have the desired group. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-29-2015
	
		
		05:08 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Does this question refer to Hadoop Service Level Authorization?  http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ServiceLevelAuth.html  If so, then there is no need to restart the NameNode to make changes in service-level ACLs take effect.  Instead, an admin can run this command:  hdfs dfsadmin -refreshServiceAcl  More documentation on this command is available here:  http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin  There is similar functionality for YARN too:  http://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#rmadmin  Another way to manage this is to declare a single "hadoopaccess" group for use in the service-level ACL definitions.  Whenever a new set of users needs access, they would be added to this group.  This shifts the management effort to an AD/LDAP administrator.  Different IT shops would likely make a different trade-off between managing it that way or managing it in the service-level authorization policy files.  Both approaches are valid, and it depends on the operator's preference. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		- « Previous
 - Next »