Member since
02-08-2016
793
Posts
669
Kudos Received
85
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2470 | 06-30-2017 05:30 PM | |
3202 | 06-30-2017 02:57 PM | |
2654 | 05-30-2017 07:00 AM | |
3107 | 01-20-2017 10:18 AM | |
6213 | 01-11-2017 02:11 PM |
12-25-2016
10:07 AM
3 Kudos
SYMPTOM: When trying to enable the Ranger plugin for the HIVE component and then clicking SAVE, it does not save this as a new configuration. Certain Hive Smart Config changes cannot be made (set Authorization to Ranger, etc) when Oracle JDBC URL contains a non-standard port.
ROOT CAUSE:
https://hortonworks.jira.com/browse/BUG-50133
WORKAROUND:
Reach out to Hortonworks Support for a hotfix patch and instructions to address this issue.
... View more
Labels:
12-25-2016
08:30 AM
3 Kudos
SYMPTOM: Nagios install fails due to existing "nagios" user in the ldap system trying to install ambari on single node cluster but is having issues with nagios since they have an ldap server with a nagios user already created. ERROR: Receiving error; "err: /Stage2/Hdp-nagios::Server/Hdp::Usernagios/Usernagios/gid: change from 20000 to nagios failed: Could not set gid on usernagios: Execution of '/usr/sbin/usermod -g 492 nagios' returned 6: usermod: user 'nagios' does not exist in /etc/passwd
ROOT CAUSE: This is BUG - https://hortonworks.jira.com/browse/BUG-6787
RESOLUTION / WORKAROUND: Workaround #1: That user must be in group = nagios as well, or the nagios install fails. The workaround for this is to add nagios user to a nagios group. Workaround #2: During install, customize nagios user (Customize Services > Misc) and use something other than nagios user. This user will be created and put in a group nagios. This is available in HDP 1.3.1
... View more
12-25-2016
08:10 AM
3 Kudos
SYMPTOM: External group synced from Unix/LDAP/AD system are labelled as ‘Internal’ in Ranger Admin UI. Currently, when I check under Ranger -> Settings -> Groups, I see around 1700 groups. And all the groups are shown as internal. However, when I login to the Linux hosts I see only 77 groups. I am wondering how to identify from UI which groups are internal and external as all the groups are shown as internal.
ERROR: 1. After Ranger and Usersync installation, started Ranger admin and Usersync to sync UNIX OS users. 2. Logged in to Ranger Admin Dashboard and visited User/Group section, then clicked on Group tab. External group synced from Unix/LDAP/AD system are labelled as ‘Internal’ in Ranger Admin UI.
ROOT CAUSE: This bug is targeted for inclusion in the Release Notes>Known Issue in HDP 2.3.4, as it has fixVersion in (Dal-M21, Dal-M30, DAL-M30, Dal-Next) https://hortonworks.jira.com/browse/BUG-47662 [HWX internal url for tracking purpose]
https://reviews.apache.org/r/42146/
RESOLUTION: Upgrading to HDP - 2.4.2.0 resolved the issue.
... View more
Labels:
12-25-2016
07:56 AM
4 Kudos
Question: We would like to Understand the mechanism behind the properties yarn.scheduler.capacity.node-locality-delay ==> All the YARN schedulers try to honor locality requests. On a busy cluster, if an application requests a particular node, there is a good chance that other containers are running on it at the time of the request. The obvious course of action is to immediately loosen the locality requirement and allocate a container on the same rack. However, it has been observed in practice that waiting a short time (no more than a few seconds) can dramatically increase the chances of being allocated a container on the requested node, and therefore increase the efficiency of the cluster. This feature is called delay scheduling, and it is supported by both the Capacity Scheduler and the Fair Scheduler.
Every node manager in a YARN cluster periodically sends a heartbeat request to the resource manager—by default, one per second. Heartbeats carry information about the node manager’s running containers and the resources available for new containers, so each heartbeat is a potential scheduling opportunity for an application to run a container.
When using delay scheduling, the scheduler doesn’t simply use the first scheduling opportunity it receives, but waits for up to a given maximum number of scheduling opportunities to occur before loosening the locality constraint and taking the next scheduling opportunity.
For the Capacity Scheduler, delay scheduling is configured by setting yarn.scheduler.capacity.node-locality-delay to a positive integer representing the number of scheduling opportunities that it is prepared to miss before loosening the node constraint to match any node in the same rack. I do see an interesting explanation on this with below article. Please do read it.
http://johnjianfang.blogspot.in/2014/08/delay-scheduling-in-capacity-scheduling.html
... View more
Labels:
12-25-2016
07:52 AM
4 Kudos
SYMPTOM: While following steps for HDP pre-upgrade activity the active namenode went down while issuing - #hdfs dfsadmin -savenamespace" command. Below was the error - ================================================
hdfs@namenode1~$ hdfs dfsadmin -saveNamespace
saveNamespace: Call From namenode1.example.com/10.160.81.30 to namenode1.example.com:8020
failed on connection exception: java.net.ConnectException: Connection refused; For more details see:
http://wiki.apache.org/hadoop/ConnectionRefused
================================================
ERROR: 2016-06-14 02:18:49,774 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2016-06-14 02:18:51,774 INFO ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: namenode1.example.com/10.10.20.30:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2016-06-14 02:18:51,775 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2016-06-14 02:18:53,776 INFO ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: namenode1.example.com/10.10.20.30:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2016-06-14 02:18:53,777 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
2016-06-14 02:18:55,778 INFO ipc.Client (Client.java:handleConnectionFailure(859)) - Retrying connect to server: namenode1.example.com/10.10.20.30:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000 MILLISECONDS)
2016-06-14 02:18:55,778 WARN ha.HealthMonitor (HealthMonitor.java:doHealthChecks(209)) - Transport-level exception trying to monitor health of NameNode at namenode1.example.com/10.10.20.30:8020: Call From namenode1.example.com/10.10.20.30 to namenode1.example.com:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
OR ERROR org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Swallowing exception in NameNodeEditLogRoller:
java.lang.IllegalStateException: Bad state: BETWEEN_LOG_SEGMENTS
at com.google.common.base.Preconditions.checkState(Preconditions.java:172)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.getCurSegmentTxId(FSEditLog.java:493)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem$NameNodeEditLogRoller.run(FSNamesystem.java:4358)
at java.lang.Thread.run(Thread.java:745)
ROOT CAUSE: This is a BUG https://issues.apache.org/jira/browse/HDFS-7871 and it has been fixed in HDP 2.2.9 and HDP 2.4. RESOLUTION: Upgrading to HDP 2.4.0.0-169 resolved the issue.
... View more
12-25-2016
07:35 AM
5 Kudos
SYMPTOM : While installing Ranger through Ambari , on Oracle DB, user runs into error "sqlplus64: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory"
ERROR: sqlplus64: error while loading shared libraries: libsqlplus.so: cannot open shared object file: No such file or directory ROOT CAUSE : possible missing environment settings and misconfiguration in Ambari . RESOLUTION :
1. On Ambari Server Machine, please check environment variable in Linux OS level, verify environmental path LD_LIBRARY_PATH=/usr/lib/oracle/<version>/client64/lib is set. 2. AND in Ambari config -> ranger config - advanced ranger-env, set property "oracle_home" with "/usr/lib/oracle/<version>/client64/lib"
... View more
Labels:
12-25-2016
07:24 AM
4 Kudos
SYMPTOM: During an upgrade to Ambari 2.1.2 and HDP 2.3.2 the upgrade fails when trying to start the namenode service. Errors in the stack trace indicate that the xasecure files are not found. ISSUE: When Ranger has been installed outside of Ambari, the method used for previous HDP versions, the Ambari database does not contain information about Ranger. As a result the configuration files are not replicated to the new directory structure, /etc/hadoop/<version>/0/. Since the Ranger properties are included in other component configuration files the namenode service will not start. A result of the missing configuration files for Ranger. RESOLUTION: If Ranger was installed outside of Ambari you can replicate the xasecure files from the previous version to the current version configuration directory.
cp /etc/hadoop/conf/xasecure* /etc/hadoop/<version>/0/
ie. cp /etc/hadoop/conf/xasecure* /etc/hadoop/2.3.2.0-2950/0/
** You will need to complete this on all namenode servers in the cluster
After you complete this you will be able to start the nameservice using the following command.
/usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenode This will allow you to continue the upgrade process.
... View more
Labels:
07-17-2018
06:29 PM
Hi Sagar, Thank you for providing details on automating the ldap-sys. It worked wonderfully. Thanks, Abhishek
... View more
12-24-2016
07:35 PM
5 Kudos
SYMPTOM: We had problem with namenode starts. We can check if there is namenode startup issue then we can check for gc related errors in logs and same time check for list of files/blocks customer is having on cluster and the recommended HEAP size as per hortonworks manual - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html
ROOT CAUSE: GC errors were causing the issue for namenode to start RESOLUTION: Below are the steps taken to resolve the issue - 1. Logged in to the namenode cli
2. When checked from cli using “ps –ref |grep –i namenode”, the namenode was not displaying.
3. Seems that the namenode process was getting killed after specific interval of time, but ambari was still showing the namenode process state in UI as “starting”
4. Cancelled the namenode starting process from Ambari UI.
5. Tried starting the whole HDFS process and simultaneously ran “IOSTAT” on the fsimage disk.
6, We found within iostat output the “Blk_read/s” was not displaying any value.
7. The namenode process was still getting killed.
8. We tried to enable debug using “export HADOOPROOTLOGGER=DEBUG,console” and ran the command “hadoop namenode”
9. We found that the namenode was have GC issue from the above command logs.
10. We suggested customer to increase Namenode HEAP SIZE from 3Gb to 4Gb and customer was able to start the Namenodes.
11. As per namenode heap recommendations “https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_installing_manually_book/content/ref-80953924-1cbf-4655-9953-1e744290a6c3.1.html”
12. Increased HEAP size for namenode to “5376m”to "8072m" as there was approx 10million files on cluster.
... View more
Labels:
12-24-2016
06:49 PM
2 Kudos
Problem Statement: when doing an ambari-server sync-ldap -groups=<your file> It will bring over the groups but not the users in it. ROOT CAUSE: When troubleshooting why the group members are not being sync'd with FreeIPA, a packet trace helped identify the issue. With ActiveDirectory the user's DN is exposed as an attribute: "distinguishedName", this is not the case inFreeIPA/RHEL IDM (using 389 DS for the directory server implementation). The DN is not an attribute on the user, and cannot be used in a filter like this: (&(objectClass=posixaccount)(|(dn=uid=dstreev,cn=users,cn=accounts,dc=hdp,dc=local)(uid=uid=dstreev,cn=users,cn=accounts,dc=hdp,dc=local)))
If we want to retrieve a specific object by DN we have to set the DN as the search base and do a base search scope. ldapsearch -H ldap://ad.hortonworks.local:389 -x -D "CN=hadoopsvc,CN=Users,dc=hortonworks,dc=local" -W -b "CN=paul,CN=Users,DC=hortonworks,DC=local" -s base -a always "(objectClass=user)"
In this case I'm looking for the user with DN: CN=paul,CN=Users,DC=hortonworks,DC=local. My bind user is hadoopsvc, and because this is AD my objectClass is user. RESOLUTION: This is a known bug: https://hortonworks.jira.com/browse/BUG-45536 (this link is an internal Hortonworks link and it's published here for reference purposes) There is no workaround, this is fixed in 2.1.3 version of ambari, per the bug.
... View more
Labels: