About cnauroth

cnauroth · ‎06-13-2016

@ScipioTheYounger, as described in the document you linked, you'd want to change ha.zookeeper.acl in core-site.xml to this: <property> <name>ha.zookeeper.acl</name> <value>sasl:nn:rwcda</value> </property> Then, you'd want to run the following to reformat ZooKeeper for NameNode HA, which would reinitialize the znode used by NameNode HA to coordinate automatic failover. hdfs zkfc -formatZK -force The tricky part, as you noticed, is getting that command to authenticate with SASL. The ZooKeeper and SASL guide in the Apache documentation discusses implementation and configuration of SASL in ZooKeeper in detail. For this particular command, you can use this procedure. First, create a JAAS configuration file at /etc/hadoop/conf/hdfs_jaas.conf: Client { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true useTicketCache=false keyTab="/etc/security/keytabs/nn.service.keytab" principal="nn/<HOST>@EXAMPLE.COM"; }; Note that the <HOST> will be different depending on the NameNode hostnames in your environment. Likewise, you'll need to change EXAMPLE.COM to the correct Kerberos realm. Next, edit /etc/hadoop/conf/hadoop-env.sh, and add the following line to enable SASL when running the zkfc command. export HADOOP_ZKFC_OPTS="-Dzookeeper.sasl.client=true -Dzookeeper.sasl.client.username=zookeeper -Djava.security.auth.login.config=/etc/hadoop/conf/hdfs_jaas.conf -Dzookeeper.sasl.clientconfig=Client ${HADOOP_ZKFC_OPTS}" Then, run the "hdfs zkfc -formatZK -force" command.

cnauroth · ‎06-11-2016

Hello @Thiago. It is possible to achieve communication across secured and unsecured clusters. A common use case for this is using DistCp for transfer of data between clusters. As mentioned in other answers, the configuration property ipc.client.fallback-to-simple-auth-allowed=true tells a secured client that it may enter a fallback unsecured mode when the unsecured server side fails to satisfy authentication. However, I recommend not setting this in core-site.xml, and instead setting it on the command line invocation specifically for the DistCp command that needs to communicate with the unsecured cluster. Setting it in core-site.xml means that all RPC connections for any application are eligible for fallback to simple authentication. This potentially expands the attack surface for man-in-the-middle attacks. Here is an example of overriding the setting on the command line while running DistCp: hadoop distcp -D ipc.client.fallback-to-simple-auth-allowed=true hdfs://nn1:8020/foo/bar hdfs://nn2:8020/bar/foo The command must be run while logged into the secured cluster, not the unsecured cluster. This is adapted from one of my prior answers: https://community.hortonworks.com/questions/294/running-distcp-between-two-cluster-one-kerberized.html

cnauroth · ‎06-11-2016

@Robert Levas, thanks for the great article! May I also suggest adding information about the "hadoop kerbname" or "hadoop org.apache.hadoop.security.HadoopKerberosName" shell command? This is a helpful debugging tool that prints the current prinicipal's short name after Hadoop applies the currently configured auth_to_local rules. If you'd like, feel free to copy-paste my text from this answer: https://community.hortonworks.com/questions/38573/pig-view-hdfs-test-failing-service-hdfs-check-fail.html .

cnauroth · ‎06-08-2016

Hello @Mingliang Liu. Nice article! I'd like to add that in step 7, when doing a distro build, I often like to speed it up a little more by passing the argument -Dmaven.javadoc.skip=true. As long as I don't need to inspect JavaDoc changes, this can make the build complete faster.

cnauroth · ‎06-08-2016

Hello @Xavier LIEUTIER. These log messages indicate that there was a timeout condition when the NameNode attempted to call the JournalNodes. The NameNode must successfully call a quorum of JournalNodes: at least 2 out of 3. This means that the call timed out to at least 2 out of 3 of them. This is a fatal condition for the NameNode, so by design, it aborts. There are multiple potential reasons for this timeout condition. Reviewing logs from the NameNodes and JournalNodes would likely reveal more details. There are several common causes to watch for: A long stop-the-world garbage collection pause may surpass the timeout threshold for the call. Garbage collection logging would show what kind of garbage collection activity the process is doing. You might also see log messages about the "JvmPauseMonitor". Consider reviewing the article NameNode Garbage Collection Configuration: Best Practices and Rationale to make sure your cluster's heap and garbage collection settings match best practices. In environments that integrate with LDAP for resolution of users' group memberships, load problems on the LDAP infrastructure can cause delays. In extreme cases, we have seen such timeouts at the JournalNodes cause edit logging calls to fail, which causes a NameNode abort and an HA failover. See Hadoop and LDAP: Usage, Load Patterns and Tuning for a more detailed description and potential mitigation steps. It is possible that there is a failure in network connectivity between the NameNode and the JournalNodes. This tends to be rare, because NameNodes and JournalNodes tend to be colocated on the same host or placed relatively close to one another in the network topology. Still, it is worth investigating that basic network connectivity between all NameNode hosts and all JournalNode hosts is working fine.

cnauroth · ‎06-08-2016

LDAP Usage Hadoop may be configured to use LDAP as the source for resolving an authenticated user's list of group memberships. A common example where Hadoop needs to resolve group memberships is the permission checks performed by HDFS at the NameNode. The Apache documentation's HDFS Permissions Guide contains further discussion of how the group mapping works: the NameNode calls a configurable plugin to get the user's group memberships before checking permissions. Despite that document's focus on group resolution at the NameNode, many other Hadoop processes also call the group mapping. The information in this document applies to the entire ecosystem of Hadoop-related components. As described in that document, the exact implementation of the group mapping is configurable. Here is the documentation of the configuration property from core-default.xml and its default value. <property> <name>hadoop.security.group.mapping</name> <value>org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback</value> <description> Class for user to group mapping (get groups for a given user) for ACL. The default implementation, org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback, will determine if the Java Native Interface (JNI) is available. If JNI is available the implementation will use the API within hadoop to resolve a list of groups for a user. If JNI is not available then the shell implementation, ShellBasedUnixGroupsMapping, is used. This implementation shells out to the Linux/Unix environment with the <code>bash -c groups</code> command to resolve a list of groups for a user. </description> </property> LDAP integration arises from several possible configuration scenarios: hadoop.security.group.mapping=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback, and the host OS integrates directly with LDAP, such as via pam_ldap. A Hadoop process will look up group memberships via standard syscalls, and those syscalls will delegate to pam_ldap. hadoop.security.group.mapping=org.apache.hadoop.security.LdapGroupsMapping. A Hadoop process will call the LDAP server directly. This can be useful if the host OS cannot integrate with LDAP for some reason. As a side effect, it is possible that Hadoop will see a different list of group memberships for a user compared to what the host OS reports, such as by running the "groups" command at the shell. Since group mapping is pluggable, it is possible (though rare) that a deployment has configured hadoop.security.group.mapping as a custom implementation of the org.apache.hadoop.security.GroupMappingServiceProvider interface. In that case, the integration pattern will vary depending on the implementation details. Troubleshooting Group Membership If there is any doubt about how Hadoop is resolving a user's group memberships, then a helpful troubleshooting step is to run the following command while logged in as the user. This will print authentication information for the current user, including group memberships, as they are really seen by the Hadoop code. > hadoop org.apache.hadoop.security.UserGroupInformation Getting UGI for current user User: chris Group Ids: Groups: staff everyone localaccounts _appserverusr admin _appserveradm _lpadmin _appstore _lpoperator _developer com.apple.access_screensharing com.apple.access_ssh UGI: chris (auth:SIMPLE) Auth method SIMPLE Keytab false ============================================================ However, in the case of HDFS file permissions, recall that the group resolution really occurs at the NameNode before it checks authorization for the user. If configuration is different at the NameNode compared to the client host, then it's possible that the NameNode will see different results for the group memberships. To see the NameNode's opinion of the user's group memberships, run the following command. > hdfs groups chris : staff everyone localaccounts _appserverusr admin _appserveradm _lpadmin _appstore _lpoperator _developer com.apple.access_screensharing com.apple.access_ssh Load Patterns As a distributed system running across hundreds or thousands of nodes, all independently resolving users' group memberships, this usage pattern may generate unexpectedly high call volume to the LDAP infrastructure. Typical symptoms are slow responses from the LDAP server, perhaps resulting in timeouts. If group resolution takes too long, then the Hadoop process may log a message like this: 2016-06-07 13:07:00,831 WARN security.Groups (Groups.java:getGroups(181)) - Potential performance problem: getGroups(user=chris) took 13018 milliseconds. The exact timeout threshold for this warning is configurable, with a default value of 5 seconds. <property> <name>hadoop.security.groups.cache.warn.after.ms</name> <value>5000</value> <description> If looking up a single user to group takes longer than this amount of milliseconds, we will log a warning message. </description> </property> Impacts The exact impact to the Hadoop process varies. In many cases, such as execution of a YARN container running a map task, the delay simply increases total latency of execution for that container. A more harmful case is slow lookup at the HDFS JournalNode. If multiple JournalNodes simultaneously experience a long delay in group resolution, then it's possible to exceed the NameNode's timeout for JournalNode calls. The NameNode must be able to log edits to a quorum of JournalNodes (i.e. 2 out of 3 JournalNodes). If the calls time out to 2 or more JournalNodes, then it's a fatal condition. The NameNode must be able to log transactions successfully, and if it fails, then it aborts intentionally. This condition would trigger an unwanted HA failover. The problem might reoccur after failover, resulting in flapping. If this happens, then the JournalNode logs will show the "performance problem" mentioned above, and the NameNode logs will show a message about "Timed out waiting for a quorum of nodes to respond" before a FATAL shutdown error. Tuning If your cluster is encountering problems due to high load on LDAP infrastructure, then there are several possible ways to mitigate this by tuning the Hadoop deployment. In-Process Caching Hadoop supports in-process caching of group membership resolution data. There are several configuration properties that control the behavior of the cache. Tuning these properties may help mitigate LDAP load issues. <property> <name>hadoop.security.groups.cache.secs</name> <value>300</value> <description> This is the config controlling the validity of the entries in the cache containing the user->group mapping. When this duration has expired, then the implementation of the group mapping provider is invoked to get the groups of the user and then cached back. </description> </property> <property> <name>hadoop.security.groups.negative-cache.secs</name> <value>30</value> <description> Expiration time for entries in the the negative user-to-group mapping caching, in seconds. This is useful when invalid users are retrying frequently. It is suggested to set a small value for this expiration, since a transient error in group lookup could temporarily lock out a legitimate user. Set this to zero or negative value to disable negative user-to-group caching. </description> </property> The NameNode and ResourceManager provide administrative commands for forcing invalidation of the in-process group cache. This can be useful for propagating group membership changes without requiring a restart of the NameNode or ResourceManager process. > hdfs dfsadmin -refreshUserToGroupsMappings Refresh user to groups mapping successful > yarn rmadmin -refreshUserToGroupsMappings 16/06/08 11:38:20 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8033 External Caching with Name Service Cache Daemon If the host OS integrates with LDAP (e.g. hadoop.security.group.mapping=org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback and the host OS uses pam_ldap), then the Name Service Cache Daemon is an effective approach for caching group memberships at the OS layer. Note that this approach is superior to Hadoop's in-process caching, because nscd would allow multiple Hadoop processes running on the same host to share a common cache and avoid repeated lookups across different processes. However, nscd is unlikely to be beneficial if hadoop.security.group.mapping=org.apache.hadoop.security.LdapGroupsMapping, because Hadoop processes will issue their own LDAP calls directly instead of delegating to the host OS. Static Group Mapping Hadoop also supports specifying a static mapping of users to their group memberships in configuration in core-site.xml. <property> <name>hadoop.user.group.static.mapping.overrides</name> <value>dr.who=;</value> <description> Static mapping of user to groups. This will override the groups if available in the system for the specified user. In otherwords, groups look-up will not happen for these users, instead groups mapped in this configuration will be used. Mapping should be in this format. user1=group1,group2;user2=;user3=group2; Default, "dr.who=;" will consider "dr.who" as user without groups. </description> </property> This approach completely bypasses LDAP (or any other group lookup mechanism) for the specified users. A drawback of this approach is that administrators lose centralized management of group memberships through LDAP for the specified users. In practice, this is not a significant drawback for the HDP service principals, which generally don't change their group memberships. For example: <property> <name>hadoop.user.group.static.mapping.overrides</name> <value>hive=hadoop,hive;hdfs=hadoop,hdfs;oozie=users,hadoop,oozie;knox=hadoop;mapred=hadoop,mapred;zookeeper=hadoop;falcon=hadoop;sqoop=hadoop;yarn=hadoop;hcat=hadoop;ams=hadoop;root=hadoop;ranger=hadoop;rangerlogger=hadoop;rangeradmin=hadoop;ambari-qa=hadoop,users;</value> </property> Static mapping is particularly effective at mitigating the problem of slow group lookups at the JournalNode discussed earlier. JournalNode calls are almost exclusively performed by the hdfs service principal, so specifying it in static mapping prevents the need for the JournalNode to call LDAP. Any configuration tuning would require a restart of the relevant Hadoop process (such as NameNode or JournalNode) for the change to take effect.

cnauroth · ‎03-24-2016

@Raja Ray, I recommend checking if your NameNode host is running out of disk space. Here is the main thing I noticed in that log: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.SafeModeException): Cannot renew lease for DFSClient_hb_rs_fsdata1c.corp.arc.com,60020,1452067957740_752162766_33. Name node is in safe mode. Resources are low on NN. Please add or free up more resources then turn off safe mode manually. NOTE: If you turn off safe mode before adding resources, the NN will immediately return to safe mode. Use "hdfs dfsadmin -safemode leave" to turn safe mode off. The NameNode periodically checks if there is remaining disk space available on all of the volumes it uses for writing edit logs. If not, then it enters safe mode automatically as a precaution.

cnauroth · ‎03-17-2016

@Sagar Shimpi, the NameNode will not persist or otherwise remember who is a super-user or a member of the super-group across process restarts. In your example, after setting dfs.permissions.superusergroup=hdfs3 and restarting the NameNode, only members of the hdfs3 group (i.e. user test3) would have super-user rights. (Also, the user ID that launched the NameNode process is always the super-user. That part is not changed by configuration.)

cnauroth · ‎03-14-2016

Hello @Sagar Shimpi. Yes, this has been tested successfully. After changing dfs.permissions.superusergroup in hdfs-site.xml, it would require a NameNode restart for the change to take effect. If this cluster uses NameNode HA with QuorumJournalManager, then both NameNodes need to be restarted. If that still doesn't work, then a helpful troubleshooting step would be to try running "hdfs groups <username>", where <username> is the user that you have added to the group that you want to be the HDFS supergroup. This command will print out a list of that user's group memberships, as perceived by the NameNode. If the list does not show your configured supergroup, then this indicates there is some kind of misconfiguration. Perhaps the user has not really been added to the group, or perhaps there is some custom group mapping in effect for your cluster that is not behaving as expected. More information on how group mapping works is available here: http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#Group_Mapping

cnauroth · ‎03-09-2016

The autopurge.snapRetainCount configuration setting is described here. https://zookeeper.apache.org/doc/r3.4.8/zookeeperAdmin.html#sc_advancedConfiguration Old snapshots/logs represent earlier points in time for the state saved in a ZooKeeper cluster, similar to a backup. Tuning the autopurge.snapRetainCount represents a trade-off between disk space consumption and flexibility to restore back to earlier points in time. In practice, I have not needed to restore state to an earlier point in time like this. Typical applications of ZooKeeper use it for transient coordination data that can be recreated from first principles easily in the event of a disaster, so going back to an earlier point in time generally isn't very important. For that reason, I haven't needed to tune up the retention count.

Online	Offline
Last Visited	‎01-13-2017 05:20 PM

Member Since	‎09-29-2015 10:51 PM
Last Visited	‎01-13-2017 05:20 PM
Posts	123
Kudos received	216

Cloudera Community

Re: How to debug the issue "IPC's epoch X is less ...

Re: Why hdfs://mycluster/ different from /

Re: querying a partition table

Re: NameNode HA Ambari Display Issue

Re: Tips for optimizing export to S3(n) ?

Re: Switch NameNode HA Zookeeper access from no s...

Re: Connection between clusters with and without K...

Re: Auth-to-local Rules Syntax

Re: Setting Hadoop development environment on Mac ...

Re: Name Node instability : flush failed for requi...

Hadoop and LDAP: Usage, Load Patterns and Tuning

Re: Hbase region server getting down citing lease ...

Re: How to create superuser same as hdfs ?

Re: How to create superuser same as hdfs ?

Re: Is there any specific reason to retain more th...