Member since
02-08-2016
793
Posts
669
Kudos Received
85
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3067 | 06-30-2017 05:30 PM | |
3988 | 06-30-2017 02:57 PM | |
3312 | 05-30-2017 07:00 AM | |
3884 | 01-20-2017 10:18 AM | |
8403 | 01-11-2017 02:11 PM |
12-06-2016
01:02 PM
5 Kudos
SYMPTOM: While performing step for Namenode HA, on the step of 'hdfs namenode -initializeSharedEdits' it failed with below error - ERROR: [root@localhost conf]# sudo su hdfs -l -c 'hdfs namenode -initializeSharedEdits'
16/11/22 09:43:56 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = locahost/10.200.206.12
STARTUP_MSG: args = [-initializeSharedEdits]
STARTUP_MSG: version = 2.7.1.2.3.0.0-2557
STARTUP_MSG: classpath = /usr/hdp/2.3.0.0-2557/hadoop/conf:/usr/hdp/2.3.0.0-2557/hadoop/lib/commons-cli-1.2.jar:/us r/hdp/2.3.0.0-2557/hadoop/lib/xmlenc-0.52.jar:/usr/hdp/2.3.0.0-2557/hadoop/lib/jsch-0.1.42.jar:/usr/hdp/2.3.0.0-2557/
.
.
.
STARTUP_MSG: build = git@github.com:hortonworks/hadoop.git -r 9f17d40a0f2046d217b2bff90ad6e2fc7e41f5e1; compiled by 'jenkins' on 2015-07-14T13:08Z
STARTUP_MSG: java = 1.8.0_51
************************************************************/
16/11/22 09:43:56 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
16/11/22 09:43:56 INFO namenode.NameNode: createNameNode [-initializeSharedEdits]
16/11/22 09:43:56 ERROR namenode.NameNode: No shared edits directory configured for namespace null namenode null
16/11/22 09:43:56 INFO util.ExitUtil: Exiting with status 0
16/11/22 09:43:56 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at localhost/10.200.206.12
************************************************************/
ROOT CAUSE: While performing NN Ha we found that Step 5 "Configure Components" is getting executed in a second, which was issue suspected. It was not stopping services and performing desired steps mentioned in screenshot below - RESOLUTION: Suspected that Ambari server was holding cache in DB. Clearing the Ambari server cache using below steps resolved the issue - # log out from the ambari server
# list persisted k-v(ui cache)
$curl -i -u admin:admin -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/persist
# get the cached state:
$curl -i -u admin:admin -H 'X-Requested-By: ambari' -X GET http://localhost:8080/api/v1/persist/CLUSTER_CURRENT_STATUS
# reset/clean the cache
$curl -u admin:admin -H 'X-Requested-By:ambari' -d '{"CLUSTER_CURRENT_STATUS": "{\"clusterState\":\"DEFAULT\"}" }' -X POST 'http://localhost:8080/api/v1/persist'
... View more
Labels:
11-29-2016
05:40 PM
Found the problem. Investigating the hiveserver2.log showed that rangerlogger failed to flush data to the db due to permission problem (Internal Exception: java.sql.SQLException: Access denied for user 'rangerlogger'@'<server>' (using password: YES)
Error Code: 1045) After granting the permissions it flushed the data successfully. Now i just need to figure out what caused the loss of permissions and how the hell it is related to the server reboot.... Thanks for all your help !
... View more
12-01-2016
07:36 AM
It was Namenode connectivity issue. The hosts file on the namenode had to be fixed.
The first line should be `127.0.0.1 localhost` In my hosts file it was `127.0.0.1 hostname1 localhost` I removed the `hostname1` and it was fixed.
... View more
11-28-2016
01:40 PM
7 Kudos
Grafana username and password is stored in a sqlite3 database. One of the way is to reset the password back to admin first and then can be changed in Grafana Dashboard. To do the same, following steps could be used: 1. Logon to the node where Grafana is installed and invoke Grafana sqlite3 database as follows: # sqlite3 /var/lib/ambari-metrics-grafana/grafana.db
sqlite> select salt, password from user;
pyaUhfDzYg|54c7d1ce2eeaa6000bd84407d0f8ab4663dfa575e0a326bc70dc5cab4b864f6677b21879dbf5e33427c88f9160f744b625bf
sqlite> update user set password = '59acf18b94d7eb0694c61e60ce44c110c7a683ac6a8f09580d626f90f4a242000746579358d77dd9e570e83fa24faa88a8a6', salt = 'F3FAxVm33R' where login = 'admin';
sqlite> .exit 2. Once done, edit Ambari Metrics Server-Configs and update Grafana Password to "admin" 3. Restart the Ambari Metrics Server 4. Access Grafana page using the Quick Links under Ambari Metric Server Dashboard 5. Click on the Grafana Symbol in the top left corner of the screen and Sign-in as admin user with "admin" password 6. Click on Global Users and edit admin user to change the password 7. Once the password is changed, change Grafana Admin password in Ambari Metrics Server-Configs as well Note: If the password is incorrect in Grafana database / Ambari Metrics Server configs, following errors would be observed: Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_grafana.py", line 64, in <module>
AmsGrafana().execute()
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 219, in execute
method(env)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", line 535, in restart
self.start(env)
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_grafana.py", line 46, in start
create_ams_datasource()
File "/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_grafana_util.py", line 230, in create_ams_datasource
(response.status, response.reason, data))
resource_management.core.exceptions.Fail: Ambari Metrics Grafana data source creation failed. POST request status: 401 Unauthorized
{"message":"Invalid username or password"}
... View more
Labels:
04-08-2017
09:26 AM
I tried all the options. Still am suffering to come out of this issue 😞
... View more
11-19-2016
09:24 PM
4 Kudos
Question: I have installed HDP cluster using Ambari which has Ranger service installed and working properly. I enabled the plugin Kafka for Ranger. I noticed something a little bit annoying. If Ranger Admin is down, Kafka will take a long time to start, because it tries to connect to Ranger Admin to get the repository. The error log is something like as shown below - ###
Will retry 74 time(s), caught exception: Connection failed to Ranger Admin. Reason - [Errno 111] Connection refused.. Sleeping for 8 sec(s)
### Is there a way to decrease this number of retry or the duration of sleep between retry ?
Ranger Admin down should not have any impact on the components for which the plugins are enabled, right ? Findings: Ambari uses the following Script to return the ranger admin login check response.
/usr/lib/ambari-agent/lib/resource_management/libraries/functions/ranger_functions_v2.py and (ranger_functions.py)
/usr/lib/ambari-server/lib/resource_management/libraries/functions/ranger_functions_v2.py and (ranger_functions.py) These scripts are having the hard coded values for the retry attempts and the sleep interval. something as following: {code}
@safe_retry(times=75, sleep_time=8, backoff_factor=1, err_class=Fail, return_on_fail=None)
def check_ranger_login_urllib2(self, url):
"""
:param url: ranger admin host url
:param usernamepassword: user credentials using which repository needs to be searched.
:return: Returns login check response
"""
.
.
.
{code} So as a default behaviour Ambari will attempt total 75 times with sleeping interval of 8 seconds for the ranger admin login check response. If the ranger is down or if it does not comes up during these many times of attempt then it should throw the exception. REASON FOR ABOVE HARD CODED VALUES: 1. Blueprint based deployment as to make ensure the order of starting the services
2. Ranger admin startup can vary from environment to environment, hence the #retries was kept higher to be safe.
HOW TO: Q. I would like to decrease this hardcoded values to one minute instead of 10 minutes.
Which means 6 retries and 10s of sleep between retries. A. I see that Ambari uses the following Script to return the ranger admin login check response. #/usr/lib/ambari-agent/lib/resource_management/libraries/functions/ranger_functions_v2.py and (ranger_functions.py)
#/usr/lib/ambari-server/lib/resource_management/libraries/functions/ranger_functions_v2.py and (ranger_functions.py)
The script 'ranger_functions_v2.py' (ranger_functions.py) controls these retry interval and sleep timing. Editing the scripts retry attempts and sleep can be a temporary suggestion. However altering the ambari provided scripts are not recommended without consulting Hortonworks.
... View more
Labels:
11-19-2016
01:43 PM
6 Kudos
PROBLEM
STATEMENT: Ambari not displaying and
service action buttons after login. It displays "Move Master Wizard In Progress". Not able to perform any operation from ambari and same is
affecting for all users. ERROR: ROOT CAUSE: The
issue seems that the user who logged inn to ambari UI[acting as admin], has
tried to perform operation which was left in middle and just logged off from
UI. The operation was cached in ambari UI and hence is reflected for other
users in UI. RESOLUTION: Take
ambari server backup and run below command from Ambari node -
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d '{"wizard-data":"{\"userName\":\"<username>\",\"controllerName\":\"<controller_name>\"}"}' http://<ambari_host>:8080/api/v1/persist
username = the user
for which you are facing issue Ambari_host =
hostname of ambari node controller_name = name of the controller for which ou are making request.
curl -u admin:admin -i -H 'X-Requested-By: ambari' -X POST -d '{"wizard-data":"{\"userName\":\"admin\",\"controllerName\":\"moveMasterController\"}"}'http://ambari.example.com:8080/api/v1/persist
... View more
Labels:
11-19-2016
01:42 PM
6 Kudos
PROBLEM STATEMENT: We have recently added 40 Datnodes added to cluster and they went down immediately after adding. Below was the exception found -
ERROR: ROOT CAUSE: Customer added the 40nodes to the cluster running rack topology script as per link given - https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.0/bk_hdfs_admin_tools/content/ch05.html The topology script which was copied by customer from above link was missing "fi" on last line and hence was giving above exception.
RESOLUTION: Populated corrected script rack-topology.sh on namenode and datanodes and tried starting the datanodes after which datanode service was able to startup.
... View more
Labels:
11-19-2016
01:41 PM
6 Kudos
PROBLEM STATEMENT: Customer deleted hdfs repository from ranger ui. Tried re-enabling plugin but the repository was getting created. Reinstalled ranger, but still no luck. There was alert saying - "Ranger Admin Password check" Text is: This alert is used to ensure that the Ranger Admin password in Ambari is correct.
Response is: User:amb_ranger_admin credentials on Ambari UI are not in sync with Ranger Doing a test and changing ranger_admin_password to admin in Ambari. Configs->Advanced
Restarting Ranger, not prompted but doing so anyway. Also changed local password on ranger admin host:
[root@ip-172-53-51-18 admin]# passwd amb_ranger_admin
Changing password for user amb_ranger_admin.
New password:
Retype new password:
passwd: all authentication tokens updated successfully. Did restart of HDFS, but now HDFS was not coming up.
ERROR: Ranger admin log error: 2016-08-26 11:06:22,917 [http-bio-6080-exec-4] INFO org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:311) - Operation error. response=VXResponse={org.apache.ranger.view.VXResponse@7097c06cstatusCode={1} msgDesc={User is not allowed to update service-def, only Admin can create/update/delete Services} messageList={[VXMessage={org.apache.ranger.view.VXMessage@4718e9efname={OPER_NO_PERMISSION} rbKey={xa.error.oper_no_permission} message={User doesn't have permission to perform this operation} objectId={null} fieldName={null} }]} }
From Ambari startup stderr box:
2016-08-26 11:06:20,803 - Error creating repository. Http status code - 400.
{"statusCode":1,"msgDesc":"User is not allowed to update service-def, only Admin can create/update/delete Services","messageList":[{"name":"OPER_NO_PERMISSION","rbKey":"xa.error.oper_no_permission","message":"User doesn't have permission to perform this operation"}]}
2016-08-26 11:07:08,595 - Error creating repository. Http status code - 400.
{"statusCode":1,"msgDesc":"User is not allowed to update service-def, only Admin can create/update/delete Services","messageList":[{"name":"OPER_NO_PERMISSION","rbKey":"xa.error.oper_no_permission","message":"User doesn't have permission to perform this operation"}]}
2016-08-26 11:07:56,368 - Error creating repository. Http status code - 400.
{"statusCode":1,"msgDesc":"User is not allowed to update service-def, only Admin can create/update/delete Services","messageList":[{"name":"OPER_NO_PERMISSION","rbKey":"xa.error.oper_no_permission","message":"User doesn't have permission to perform this operation"}]}
2016-08-26 11:29:02,647 [http-bio-6080-exec-2] INFO org.apache.ranger.common.RESTErrorUtil (RESTErrorUtil.java:311) - Operation error. response=VXResponse={org.apache.ranger.view.VXResponse@6189b1d4statusCode={1} msgDesc={User is not allowed to update service-def, only Admin can create/update/delete Services} messageList={[VXMessage={org.apache.ranger.view.VXMessage@6291937cname={OPER_NO_PERMISSION} rbKey={xa.error.oper_no_permission} message={User doesn't have permission to perform this operation} objectId={null} fieldName={null} }]} }
ROOT CAUSE: Role for 'amb_ranger_admin' user was not set to admin in ranger UI.
RESOLUTION: Changing the role of 'amb_ranger_admin' user to admin from ranger UI resolved the issue.
... View more
Labels:
11-19-2016
01:41 PM
6 Kudos
PROBLEM STATEMENT: We have a strange problem with ranger. When I do a "select * from <table>; " I can see in ranger, on hive audit, that my user (dnid) is getting logged correctly.
But when I look at the same situation on hdfs audit it shows that another user did the request.
This Is to me very strange, I´ve tried with different users and the same problem happens again.
ERROR:
ROOT CAUSE: This is known issue and a BUG - https://issues.apache.org/jira/browse/HIVE-13120 http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_HDP_RelNotes/content/fixed_issues.html BUG-53108 HIVE-13120
RESOLUTION:
Changed the below property in hiveserver2 configs and restarted hiveserver2 after which the Ranger HDFS audit was showing user as hive in HDFS audit. From:
"hive.server2.enable.doAs"=true
TO:
"hive.server2.enable.doAs"=false
... View more
Labels: