About cnauroth

cnauroth · ‎11-18-2015

I recommend not setting this in core-site.xml, and instead setting it on the command line invocation specifically for the DistCp command that needs to communicate with the unsecured cluster. Setting it in core-site.xml means that all RPC connections for any application are eligible for fallback to simple authentication. This potentially expands the attack surface for man-in-the-middle attacks. Here is an example of overriding the setting on the command line while running DistCp: hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo The command must be run while logged into the secured cluster, not the unsecured cluster.

cnauroth · ‎11-18-2015

If there is no existing documentation covering the JournalNode, then here is my recommendation. Bootstrap the new server by copying the contents of dfs.journalnode.edits.dir from an existing JournalNode. Start JournalNode on the new server. Reconfigure the NameNodes to include the new server in dfs.namenode.shared.edits.dir. Restart standby NN and verify it remains healthy. Restart active NN and verify it remains healthy. Reconfigure the NameNodes to remove the old server from dfs.namenode.shared.edits.dir. Restart standby NN and verify it remains healthy. Restart active NN and verify it remains healthy. Some might note that during the copy in step 1, it's possible that additional transactions are being logged concurrently, so the copy might be out-of-date immediately. This is not a problem though. The JournalNode is capable of "catching up" by synchronizing data from other running JournalNodes. In fact, step 1 is really just an optimization of this "catching up".

cnauroth · ‎11-18-2015

ipc.server.tcpnodelay controls use of Nagle's algorithm on any server component that makes use of Hadoop's common RPC framework. That means that full deployment of a change in this setting would require a restart of any component that uses that common RPC framework. That's a broad set of components, including all HDFS, YARN and MapReduce daemons. It probably also includes other components in the wider ecosystem.

cnauroth · ‎11-18-2015

There is currently no way to define a replication factor on a directory and have it cascade down automatically to all child files. Instead of running the daemon process to change replication factor, do you have the option of setting the replication factor explicitly when you create the file? For example, here is how you can override it while saving a file through the CLI. > hdfs dfs -D dfs.replication=2 -put hello /hello > hdfs dfs -stat 'name=%n repl=%r' /hello name=hello repl=2 If your use case is something like a MapReduce job, then you can override dfs.replication at job submission time too. Creating the file with the desired replication in the first place has an advantage over creating the file with replication factor 3 and then retroactively changing it to 2. Creating it with replication factor 3 temporarily wastes disk space. Changing it to replication factor 2 then creates extra work for the cluster to detect that some blocks are over-replicated, and replicas need to be deleted.

cnauroth · ‎10-30-2015

@Andrew Grande, thank you. I hadn't considered the IT challenges from the browser side.

cnauroth · ‎10-30-2015

@Neeraj, thanks for the reply. In this kind of compliance environment, is there something more that is done to mitigate the lack of authentication on the HTTP servers? Are the HTTP ports firewalled off?

cnauroth · ‎10-29-2015

The ACLs specified in the hadoop-policy.xml file refer to Hadoop service-level authorization. http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ServiceLevelAuth.html These ACLs are enforced on Hadoop RPC service calls. These ACLs are not applicable to access through WebHDFS. In order to fully control authorization to HDFS files, use HDFS permissions and ACLs. http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html Permissions and ACLs applied to directories and files are enforced for all means of access to the file system. Other potential solutions are to use Knox or Ranger.

cnauroth · ‎10-29-2015

Activating Hadoop secure mode using Kerberos and activating Hadoop HTTP authentication using SPNEGO are separate configuration steps. https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SecureMode.html https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/HttpAuthentication.html This means that it's possible to run a cluster with Kerberos authentication, but leave the HTTP endpoints unauthenticated. Is there any valid use case for running in this configuration? Enabling Kerberos authentication implies a desired for security hardening. Therefore, leaving the HTTP endpoints unauthenticated seems undesirable. I have encountered clusters that had enabled Kerberos but had not enabled HTTP authentication. When I see this, I generally advise that the admins go back and configure HTTP authentication. Am I missing a valid reason why an admin would want to keep running in this mode?

cnauroth · ‎10-29-2015

There is currently no way for a newly created directory in HDFS to set its group from the primary group of the creating user automatically. Instead, it always follows the rule quoted in the question: the group is the group of the parent directory. One way I've handled this in the past is first to create an intermediate directory and then explicitly change its group to the user's primary group, using chmod on the shell or setOwner in the Java APIs. Then, additional files and directories created by the process would use this as the destination directory. For example, a MapReduce job could specify its output directory under this intermediate directory, and then the output files created by that MapReduce job would have the desired group.

cnauroth · ‎10-29-2015

Does this question refer to Hadoop Service Level Authorization? http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-common/ServiceLevelAuth.html If so, then there is no need to restart the NameNode to make changes in service-level ACLs take effect. Instead, an admin can run this command: hdfs dfsadmin -refreshServiceAcl More documentation on this command is available here: http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/HDFSCommands.html#dfsadmin There is similar functionality for YARN too: http://hadoop.apache.org/docs/r2.7.1/hadoop-yarn/hadoop-yarn-site/YarnCommands.html#rmadmin Another way to manage this is to declare a single "hadoopaccess" group for use in the service-level ACL definitions. Whenever a new set of users needs access, they would be added to this group. This shifts the management effort to an AD/LDAP administrator. Different IT shops would likely make a different trade-off between managing it that way or managing it in the service-level authorization policy files. Both approaches are valid, and it depends on the operator's preference.

Online	Offline
Last Visited	‎01-13-2017 05:20 PM

Member Since	‎09-29-2015 10:51 PM
Last Visited	‎01-13-2017 05:20 PM
Posts	123
Kudos received	216

Cloudera Community

Re: How to debug the issue "IPC's epoch X is less ...

Re: Why hdfs://mycluster/ different from /

Re: querying a partition table

Re: NameNode HA Ambari Display Issue

Re: Tips for optimizing export to S3(n) ?

Re: Running distcp between two cluster: One Kerber...

Re: Process for moving HDP Services manually

Re: What services need to be restarted if ipc.ser...

Re: HDFS replication factor for a directory.

Re: Is there a valid use case for activating Hadoo...

Re: Is there a valid use case for activating Hadoo...

Re: User can view entire hdfs dir and navigate fur...

Is there a valid use case for activating Hadoop se...

Re: Inherit group ownership from user (not directo...

Re: Support for nested group / cascading group in ...