About sshimpi

sshimpi · ‎12-06-2016

SYMPTOM: While performing step for Namenode HA, on the step of 'hdfs namenode -initializeSharedEdits' it failed with below error - ERROR: [root@localhost conf]# sudo su hdfs -l -c 'hdfs namenode -initializeSharedEdits' 16/11/22 09:43:56 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = locahost/10.200.206.12 STARTUP_MSG: args = [-initializeSharedEdits] STARTUP_MSG: version = 2.7.1.2.3.0.0-2557 STARTUP_MSG: classpath = /usr/hdp/2.3.0.0-2557/hadoop/conf:/usr/hdp/2.3.0.0-2557/hadoop/lib/commons-cli-1.2.jar:/us r/hdp/2.3.0.0-2557/hadoop/lib/xmlenc-0.52.jar:/usr/hdp/2.3.0.0-2557/hadoop/lib/jsch-0.1.42.jar:/usr/hdp/2.3.0.0-2557/ . . . STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r 9f17d40a0f2046d217b2bff90ad6e2fc7e41f5e1; compiled by 'jenkins' on 2015-07-14T13:08Z STARTUP_MSG: java = 1.8.0_51 ************************************************************/ 16/11/22 09:43:56 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT] 16/11/22 09:43:56 INFO namenode.NameNode: createNameNode [-initializeSharedEdits] 16/11/22 09:43:56 ERROR namenode.NameNode: No shared edits directory configured for namespace null namenode null 16/11/22 09:43:56 INFO util.ExitUtil: Exiting with status 0 16/11/22 09:43:56 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down NameNode at localhost/10.200.206.12 ************************************************************/ ROOT CAUSE: While performing NN Ha we found that Step 5 "Configure Components" is getting executed in a second, which was issue suspected. It was not stopping services and performing desired steps mentioned in screenshot below - RESOLUTION: Suspected that Ambari server was holding cache in DB. Clearing the Ambari server cache using below steps resolved the issue - # log out from the ambari server # list persisted k-v(ui cache) $curl -i -u admin:admin -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/persist # get the cached state: $curl -i -u admin:admin -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/persist/CLUSTER_CURRENT_STATUS # reset/clean the cache $curl -u admin:admin -H 'X-Requested-By:ambari' -d '{"CLUSTER_CURRENT_STATUS": "{\"clusterState\":\"DEFAULT\"}" }' -X POST 'http://localhost:8080/api/v1/persist'

Adija1 · ‎11-29-2016

Found the problem. Investigating the hiveserver2.log showed that rangerlogger failed to flush data to the db due to permission problem (Internal Exception: java.sql.SQLException: Access denied for user 'rangerlogger'@'<server>' (using password: YES) Error Code: 1045) After granting the permissions it flushed the data successfully. Now i just need to figure out what caused the loss of permissions and how the hell it is related to the server reboot.... Thanks for all your help !

anirudh_menon · ‎12-01-2016

It was Namenode connectivity issue. The hosts file on the namenode had to be fixed. The first line should be `127.0.0.1 localhost` In my hosts file it was `127.0.0.1 hostname1 localhost` I removed the `hostname1` and it was fixed.

sshimpi · ‎11-28-2016

Grafana username and password is stored in a sqlite3 database. One of the way is to reset the password back to admin first and then can be changed in Grafana Dashboard. To do the same, following steps could be used: 1. Logon to the node where Grafana is installed and invoke Grafana sqlite3 database as follows: # sqlite3 /var/lib/ambari-metrics-grafana/grafana.db sqlite> select salt, password from user; pyaUhfDzYg|54c7d1ce2eeaa6000bd84407d0f8ab4663dfa575e0a326bc70dc5cab4b864f6677b21879dbf5e33427c88f9160f744b625bf sqlite> update user set password = '59acf18b94d7eb0694c61e60ce44c110c7a683ac6a8f09580d626f90f4a242000746579358d77dd9e570e83fa24faa88a8a6', salt = 'F3FAxVm33R' where login = 'admin'; sqlite> .exit 2. Once done, edit Ambari Metrics Server-Configs and update Grafana Password to "admin" 3. Restart the Ambari Metrics Server 4. Access Grafana page using the Quick Links under Ambari Metric Server Dashboard 5. Click on the Grafana Symbol in the top left corner of the screen and Sign-in as admin user with "admin" password 6. Click on Global Users and edit admin user to change the password 7. Once the password is changed, change Grafana Admin password in Ambari Metrics Server-Configs as well Note: If the password is incorrect in Grafana database / Ambari Metrics Server configs, following errors would be observed: Traceback (most recent call last): File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_grafana.py", line 64, in <module> AmsGrafana().execute() File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute method(env) File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 535, in restart self.start(env) File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_grafana.py", line 46, in start create_ams_datasource() File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_grafana_util.py", line 230, in create_ams_datasource (response.status, response.reason, data)) resource_management.core.exceptions.Fail: Ambari Metrics Grafana data source creation failed. POST request status: 401 Unauthorized {"message":"Invalid username or password"}

arun2381985 · ‎04-08-2017

I tried all the options. Still am suffering to come out of this issue 😞

sshimpi · ‎11-19-2016

Question: I have installed HDP cluster using Ambari which has Ranger service installed and working properly. I enabled the plugin Kafka for Ranger. I noticed something a little bit annoying. If Ranger Admin is down, Kafka will take a long time to start, because it tries to connect to Ranger Admin to get the repository. The error log is something like as shown below - ### Will retry 74 time(s), caught exception: Connection failed to Ranger Admin. Reason - [Errno 111] Connection refused.. Sleeping for 8 sec(s) ### Is there a way to decrease this number of retry or the duration of sleep between retry ? Ranger Admin down should not have any impact on the components for which the plugins are enabled, right ? Findings: Ambari uses the following Script to return the ranger admin login check response. /usr/lib/ambari-agent/lib/resource_management/libraries/functions/ranger_functions_v2.py and (ranger_functions.py) /usr/lib/ambari-server/lib/resource_management/libraries/functions/ranger_functions_v2.py and (ranger_functions.py) These scripts are having the hard coded values for the retry attempts and the sleep interval. something as following: {code} @safe_retry(times=75, sleep_time=8, backoff_factor=1, err_class=Fail, return_on_fail=None) def check_ranger_login_urllib2(self, url): """ :param url: ranger admin host url :param usernamepassword: user credentials using which repository needs to be searched. :return: Returns login check response """ . . . {code} So as a default behaviour Ambari will attempt total 75 times with sleeping interval of 8 seconds for the ranger admin login check response. If the ranger is down or if it does not comes up during these many times of attempt then it should throw the exception. REASON FOR ABOVE HARD CODED VALUES: 1. Blueprint based deployment as to make ensure the order of starting the services 2. Ranger admin startup can vary from environment to environment, hence the #retries was kept higher to be safe. HOW TO: Q. I would like to decrease this hardcoded values to one minute instead of 10 minutes. Which means 6 retries and 10s of sleep between retries. A. I see that Ambari uses the following Script to return the ranger admin login check response. #/usr/lib/ambari-agent/lib/resource_management/libraries/functions/ranger_functions_v2.py and (ranger_functions.py) #/usr/lib/ambari-server/lib/resource_management/libraries/functions/ranger_functions_v2.py and (ranger_functions.py) The script 'ranger_functions_v2.py' (ranger_functions.py) controls these retry interval and sleep timing. Editing the scripts retry attempts and sleep can be a temporary suggestion. However altering the ambari provided scripts are not recommended without consulting Hortonworks.

sshimpi · ‎11-19-2016

PROBLEM STATEMENT: Ambari not displaying and service action buttons after login. It displays "Move Master Wizard In Progress". Not able to perform any operation from ambari and same is affecting for all users. ERROR: ROOT CAUSE: The issue seems that the user who logged inn to ambari UI[acting as admin], has tried to perform operation which was left in middle and just logged off from UI. The operation was cached in ambari UI and hence is reflected for other users in UI. RESOLUTION: Take ambari server backup and run below command from Ambari node - curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d '{"wizard-data":"{\"userName\":\"<username>\",\"controllerName\":\"<controller_name>\"}"}' http://<ambari_host>:8080/api/v1/persist username = the user for which you are facing issue Ambari_host = hostname of ambari node controller_name = name of the controller for which ou are making request. curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d '{"wizard-data":"{\"userName\":\"admin\",\"controllerName\":\"moveMasterController\"}"}'http://ambari.example.com:8080/api/v1/persist

sshimpi · ‎11-19-2016

PROBLEM STATEMENT: We have recently added 40 Datnodes added to cluster and they went down immediately after adding. Below was the exception found - ERROR: ROOT CAUSE: Customer added the 40nodes to the cluster running rack topology script as per link given - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/ch05.html The topology script which was copied by customer from above link was missing "fi" on last line and hence was giving above exception. RESOLUTION: Populated corrected script rack-topology.sh on namenode and datanodes and tried starting the datanodes after which datanode service was able to startup.

sshimpi · ‎11-19-2016

PROBLEM STATEMENT: Customer deleted hdfs repository from ranger ui. Tried re-enabling plugin but the repository was getting created. Reinstalled ranger, but still no luck. There was alert saying - "Ranger Admin Password check" Text is: This alert is used to ensure that the Ranger Admin password in Ambari is correct. Response is: User:amb_ranger_admin credentials on Ambari UI are not in sync with Ranger Doing a test and changing ranger_admin_password to admin in Ambari. Configs->Advanced Restarting Ranger, not prompted but doing so anyway. Also changed local password on ranger admin host: [root@ip-172-53-51-18 admin]# passwd amb_ranger_admin Changing password for user amb_ranger_admin. New password: Retype new password: passwd: all authentication tokens updated successfully. Did restart of HDFS, but now HDFS was not coming up. ERROR: Ranger admin log error: 2016-08-26 11:06:22,917 [http-bio-6080-exec-4] INFO org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:311) - Operation error. response=VXResponse={org.apache.ranger.view.VXResponse@7097c06cstatusCode={1} msgDesc={User is not allowed to update service-def, only Admin can create/update/delete Services} messageList={[VXMessage={org.apache.ranger.view.VXMessage@4718e9efname={OPER_NO_PERMISSION} rbKey={xa.error.oper_no_permission} message={User doesn't have permission to perform this operation} objectId={null} fieldName={null} }]} } From Ambari startup stderr box: 2016-08-26 11:06:20,803 - Error creating repository. Http status code - 400. {"statusCode":1,"msgDesc":"User is not allowed to update service-def, only Admin can create/update/delete Services","messageList":[{"name":"OPER_NO_PERMISSION","rbKey":"xa.error.oper_no_permission","message":"User doesn't have permission to perform this operation"}]} 2016-08-26 11:07:08,595 - Error creating repository. Http status code - 400. {"statusCode":1,"msgDesc":"User is not allowed to update service-def, only Admin can create/update/delete Services","messageList":[{"name":"OPER_NO_PERMISSION","rbKey":"xa.error.oper_no_permission","message":"User doesn't have permission to perform this operation"}]} 2016-08-26 11:07:56,368 - Error creating repository. Http status code - 400. {"statusCode":1,"msgDesc":"User is not allowed to update service-def, only Admin can create/update/delete Services","messageList":[{"name":"OPER_NO_PERMISSION","rbKey":"xa.error.oper_no_permission","message":"User doesn't have permission to perform this operation"}]} 2016-08-26 11:29:02,647 [http-bio-6080-exec-2] INFO org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:311) - Operation error. response=VXResponse={org.apache.ranger.view.VXResponse@6189b1d4statusCode={1} msgDesc={User is not allowed to update service-def, only Admin can create/update/delete Services} messageList={[VXMessage={org.apache.ranger.view.VXMessage@6291937cname={OPER_NO_PERMISSION} rbKey={xa.error.oper_no_permission} message={User doesn't have permission to perform this operation} objectId={null} fieldName={null} }]} } ROOT CAUSE: Role for 'amb_ranger_admin' user was not set to admin in ranger UI. RESOLUTION: Changing the role of 'amb_ranger_admin' user to admin from ranger UI resolved the issue.

sshimpi · ‎11-19-2016

PROBLEM STATEMENT: We have a strange problem with ranger. When I do a "select * from <table>; " I can see in ranger, on hive audit, that my user (dnid) is getting logged correctly. But when I look at the same situation on hdfs audit it shows that another user did the request. This Is to me very strange, I´ve tried with different users and the same problem happens again. ERROR: ROOT CAUSE: This is known issue and a BUG - https://issues.apache.org/jira/browse/HIVE-13120 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_HDP_RelNotes/content/fixed_issues.html BUG-53108 HIVE-13120 RESOLUTION: Changed the below property in hiveserver2 configs and restarted hiveserver2 after which the Ranger HDFS audit was showing user as hive in HDFS audit. From: "hive.server2.enable.doAs"=true TO: "hive.server2.enable.doAs"=false

Online	Offline
Last Visited	‎12-07-2017 08:26 AM

Member Since	‎02-08-2016 09:06 AM
Last Visited	‎12-07-2017 08:26 AM
Posts	793
Kudos received	667

Cloudera Community

Re: Issue with Ranger User/group sync

Re: Ranger HDFS test connection fails

Re: Error while configuring NameNode High Availabi...

Re: Ranger policies on HDFS

Re: Can we do column value level restriction in Ap...

Namenode HA Wizard fail and is not possible to rol...

Re: Ranger Audit stopped working after server rebo...

Re: Most services fail to start after Ambari setup...

How to change Grafana admin password when the pass...

Re: Hadoop 2.5 sandbox connectivity issue(VMware p...

Services takes long time to start if Ranger is dow...

Ambari UI shows "Move Master Wizard In Progress"

Issue while adding nodes to the cluster. Newly Add...

HDFS Repository was not getting created

Ranger hdfs audit is showing wrong user