About sshimpi

sshimpi · ‎12-25-2016

SYMPTOM: When trying to enable the Ranger plugin for the HIVE component and then clicking SAVE, it does not save this as a new configuration. Certain Hive Smart Config changes cannot be made (set Authorization to Ranger, etc) when Oracle JDBC URL contains a non-standard port. ROOT CAUSE: https://hortonworks.jira.com/browse/BUG-50133 WORKAROUND: Reach out to Hortonworks Support for a hotfix patch and instructions to address this issue.

sshimpi · ‎12-25-2016

SYMPTOM: Nagios install fails due to existing "nagios" user in the ldap system trying to install ambari on single node cluster but is having issues with nagios since they have an ldap server with a nagios user already created. ERROR: Receiving error; "err: /Stage2/Hdp-nagios::Server/Hdp::Usernagios/Usernagios/gid: change from 20000 to nagios failed: Could not set gid on usernagios: Execution of '/usr/sbin/usermod -g 492 nagios' returned 6: usermod: user 'nagios' does not exist in /etc/passwd ROOT CAUSE: This is BUG - https://hortonworks.jira.com/browse/BUG-6787 RESOLUTION / WORKAROUND: Workaround #1: That user must be in group = nagios as well, or the nagios install fails. The workaround for this is to add nagios user to a nagios group. Workaround #2: During install, customize nagios user (Customize Services > Misc) and use something other than nagios user. This user will be created and put in a group nagios. This is available in HDP 1.3.1

sshimpi · ‎12-25-2016

SYMPTOM: External group synced from Unix/LDAP/AD system are labelled as ‘Internal’ in Ranger Admin UI. Currently, when I check under Ranger -> Settings -> Groups, I see around 1700 groups. And all the groups are shown as internal. However, when I login to the Linux hosts I see only 77 groups. I am wondering how to identify from UI which groups are internal and external as all the groups are shown as internal. ERROR: 1. After Ranger and Usersync installation, started Ranger admin and Usersync to sync UNIX OS users. 2. Logged in to Ranger Admin Dashboard and visited User/Group section, then clicked on Group tab. External group synced from Unix/LDAP/AD system are labelled as ‘Internal’ in Ranger Admin UI. ROOT CAUSE: This bug is targeted for inclusion in the Release Notes>Known Issue in HDP 2.3.4, as it has fixVersion in (Dal-M21, Dal-M30, DAL-M30, Dal-Next) https://hortonworks.jira.com/browse/BUG-47662 [HWX internal url for tracking purpose] https://reviews.apache.org/r/42146/ RESOLUTION: Upgrading to HDP - 2.4.2.0 resolved the issue.

sshimpi · ‎12-25-2016

Question: We would like to Understand the mechanism behind the properties yarn.scheduler.capacity.node-locality-delay ==> All the YARN schedulers try to honor locality requests. On a busy cluster, if an application requests a particular node, there is a good chance that other containers are running on it at the time of the request. The obvious course of action is to immediately loosen the locality requirement and allocate a container on the same rack. However, it has been observed in practice that waiting a short time (no more than a few seconds) can dramatically increase the chances of being allocated a container on the requested node, and therefore increase the efficiency of the cluster. This feature is called delay scheduling, and it is supported by both the Capacity Scheduler and the Fair Scheduler. Every node manager in a YARN cluster periodically sends a heartbeat request to the resource manager—by default, one per second. Heartbeats carry information about the node manager’s running containers and the resources available for new containers, so each heartbeat is a potential scheduling opportunity for an application to run a container. When using delay scheduling, the scheduler doesn’t simply use the first scheduling opportunity it receives, but waits for up to a given maximum number of scheduling opportunities to occur before loosening the locality constraint and taking the next scheduling opportunity. For the Capacity Scheduler, delay scheduling is configured by setting yarn.scheduler.capacity.node-locality-delay to a positive integer representing the number of scheduling opportunities that it is prepared to miss before loosening the node constraint to match any node in the same rack. I do see an interesting explanation on this with below article. Please do read it. http://johnjianfang.blogspot.in/2014/08/delay-scheduling-in-capacity-scheduling.html

sshimpi · ‎12-25-2016

SYMPTOM: While following steps for HDP pre-upgrade activity the active namenode went down while issuing - #hdfs dfsadmin -savenamespace" command. Below was the error - ================================================ hdfs@namenode1~$ hdfs dfsadmin -saveNamespace saveNamespace: Call From namenode1.example.com/10.160.81.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused ================================================ ERROR: 2016-06-14 02:18:49,774 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 2016-06-14 02:18:51,774 INFO ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: namenode1.example.com/10.10.20.30:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 2016-06-14 02:18:51,775 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 2016-06-14 02:18:53,776 INFO ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: namenode1.example.com/10.10.20.30:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 2016-06-14 02:18:53,777 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused 2016-06-14 02:18:55,778 INFO ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: namenode1.example.com/10.10.20.30:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS) 2016-06-14 02:18:55,778 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused OR ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Swallowing exception in NameNodeEditLogRoller: java.lang.IllegalStateException: Bad state: BETWEEN_LOG_SEGMENTS at com.google.common.base.Preconditions.checkState(Preconditions.java:172) at org.apache.hadoop.hdfs.server.namenode.FSEditLog.getCurSegmentTxId(FSEditLog.java:493) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeEditLogRoller.run(FSNamesystem.java:4358) at java.lang.Thread.run(Thread.java:745) ROOT CAUSE: This is a BUG https://issues.apache.org/jira/browse/HDFS-7871 and it has been fixed in HDP 2.2.9 and HDP 2.4. RESOLUTION: Upgrading to HDP 2.4.0.0-169 resolved the issue.

sshimpi · ‎12-25-2016

SYMPTOM : While installing Ranger through Ambari , on Oracle DB, user runs into error "sqlplus64: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory" ERROR: sqlplus64: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory ROOT CAUSE : possible missing environment settings and misconfiguration in Ambari . RESOLUTION : 1. On Ambari Server Machine, please check environment variable in Linux OS level, verify environmental path LD_LIBRARY_PATH=/usr/lib/oracle/<version>/client64/lib is set. 2. AND in Ambari config -> ranger config - advanced ranger-env, set property "oracle_home" with "/usr/lib/oracle/<version>/client64/lib"

sshimpi · ‎12-25-2016

SYMPTOM: During an upgrade to Ambari 2.1.2 and HDP 2.3.2 the upgrade fails when trying to start the namenode service. Errors in the stack trace indicate that the xasecure files are not found. ISSUE: When Ranger has been installed outside of Ambari, the method used for previous HDP versions, the Ambari database does not contain information about Ranger. As a result the configuration files are not replicated to the new directory structure, /etc/hadoop/<version>/0/. Since the Ranger properties are included in other component configuration files the namenode service will not start. A result of the missing configuration files for Ranger. RESOLUTION: If Ranger was installed outside of Ambari you can replicate the xasecure files from the previous version to the current version configuration directory. cp /etc/hadoop/conf/xasecure* /etc/hadoop/<version>/0/ ie. cp /etc/hadoop/conf/xasecure* /etc/hadoop/2.3.2.0-2950/0/ ** You will need to complete this on all namenode servers in the cluster After you complete this you will be able to start the nameservice using the following command. /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenode This will allow you to continue the upgrade process.

abhishek_c · ‎07-17-2018

Hi Sagar, Thank you for providing details on automating the ldap-sys. It worked wonderfully. Thanks, Abhishek

sshimpi · ‎12-24-2016

SYMPTOM: We had problem with namenode starts. We can check if there is namenode startup issue then we can check for gc related errors in logs and same time check for list of files/blocks customer is having on cluster and the recommended HEAP size as per hortonworks manual - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html ROOT CAUSE: GC errors were causing the issue for namenode to start RESOLUTION: Below are the steps taken to resolve the issue - 1. Logged in to the namenode cli 2. When checked from cli using “ps –ref |grep –i namenode”, the namenode was not displaying. 3. Seems that the namenode process was getting killed after specific interval of time, but ambari was still showing the namenode process state in UI as “starting” 4. Cancelled the namenode starting process from Ambari UI. 5. Tried starting the whole HDFS process and simultaneously ran “IOSTAT” on the fsimage disk. 6, We found within iostat output the “Blk_read/s” was not displaying any value. 7. The namenode process was still getting killed. 8. We tried to enable debug using “export HADOOPROOTLOGGER=DEBUG,console” and ran the command “hadoop namenode” 9. We found that the namenode was have GC issue from the above command logs. 10. We suggested customer to increase Namenode HEAP SIZE from 3Gb to 4Gb and customer was able to start the Namenodes. 11. As per namenode heap recommendations “https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html” 12. Increased HEAP size for namenode to “5376m”to "8072m" as there was approx 10million files on cluster.

sshimpi · ‎12-24-2016

Problem Statement: when doing an ambari-server sync-ldap -groups=<your file> It will bring over the groups but not the users in it. ROOT CAUSE: When troubleshooting why the group members are not being sync'd with FreeIPA, a packet trace helped identify the issue. With ActiveDirectory the user's DN is exposed as an attribute: "distinguishedName", this is not the case inFreeIPA/RHEL IDM (using 389 DS for the directory server implementation). The DN is not an attribute on the user, and cannot be used in a filter like this: (&(objectClass=posixaccount)(|(dn=uid=dstreev,cn=users,cn=accounts,dc=hdp,dc=local)(uid=uid=dstreev,cn=users,cn=accounts,dc=hdp,dc=local))) If we want to retrieve a specific object by DN we have to set the DN as the search base and do a base search scope. ldapsearch -H ldap://ad.hortonworks.local:389 -x -D "CN=hadoopsvc,CN=Users,dc=hortonworks,dc=local" -W -b "CN=paul,CN=Users,DC=hortonworks,DC=local" -s base -a always "(objectClass=user)" In this case I'm looking for the user with DN: CN=paul,CN=Users,DC=hortonworks,DC=local. My bind user is hadoopsvc, and because this is AD my objectClass is user. RESOLUTION: This is a known bug: https://hortonworks.jira.com/browse/BUG-45536 (this link is an internal Hortonworks link and it's published here for reference purposes) There is no workaround, this is fixed in 2.1.3 version of ambari, per the bug.

Online	Offline
Last Visited	‎12-07-2017 08:26 AM

Member Since	‎02-08-2016 09:06 AM
Last Visited	‎12-07-2017 08:26 AM
Posts	793
Kudos received	667

Cloudera Community

Re: Issue with Ranger User/group sync

Re: Ranger HDFS test connection fails

Re: Error while configuring NameNode High Availabi...

Re: Ranger policies on HDFS

Re: Can we do column value level restriction in Ap...

While enabling the Ranger plugin for Hive it is fa...

Nagios install fails due to existing "nagios" user...

All LDAP/AD groups shown as "Internal" users, and ...

Understanding the mechanism behind the properties ...

Active namenode getting down while we issue "hdfs ...

Fail to install Ranger through Ambari when using O...

Upgrade fails with Ranger error, Namenode not able...

Re: How to automate ldap-sync for ambari

We had problem with namenode starts

Openldap with sync-ldap --groups does not work