Created 05-08-2019 06:09 PM
I have a cluster that was running fine. I decommissioned few Data Nodes and then recommissioned them back. Then, I cannot start services using Web UI. I verified that Ambari server is running. I also followed the suggestions from this post by restarting ambari agents for all cluster nodes. Rebooted all machines. And verified that my port 8080 is listening.
root@server88:/home/user# netstat -anop | grep 8080 tcp6 0 0 :::8080 :::* LISTEN 20524/java off (0.00/0/0) tcp6 0 0 10.1.30.180:8080 10.1.50.62:50358 ESTABLISHED 20524/java off (0.00/0/0) tcp6 0 0 10.1.30.180:8080 10.1.50.62:50352 ESTABLISHED 20524/java off (0.00/0/0) tcp6 0 0 10.1.30.180:8080 10.1.50.62:50360 ESTABLISHED 20524/java off (0.00/0/0)
I did not get errors from ambari-server log file neither.
Created 05-08-2019 06:47 PM
Identify the host registered with Ambari.
select host_id,host_name from hosts;
Based on the above output then I can help you the subsequent step.
Created on 05-08-2019 08:40 PM - edited 08-17-2019 03:31 PM
How do I find the host_id? From the UI, I see host_name, not host_id
Created 05-08-2019 09:30 PM
Sorry,I thought it was straight forward you need to log on the MySQL database as ambari-user to the ambari database.
In the below example the user and password is ambari
# mysql -uambari -pambari MariaDB [(none)]> use ambari; Database changed MariaDB [ambari]> select host_id,host_name from hosts; +---------+------------------+ | host_id | host_name | +---------+------------------+ | 1 | nanetog.kento.com| +---------+------------------+
Get the host id's and hostnames of all the hosts and make sure it matches what you have in your /etc/hosts. Check if you have double entries.
HTH
Created 05-10-2019 09:25 PM
Thanks for the command. I tried the command, but it looks like I am not using the MariaDB. Here is the output:
spark@msl-dpe-perf88:/home/harry.li/TPC/benchmarks/tpcds-HW$ mysql -u ambari -p Enter password: Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 4 Server version: 5.7.23-0ubuntu0.16.04.1 (Ubuntu) Copyright (c) 2000, 2018, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> use ambari; Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Database changed mysql> select host_id,host_name from hosts; Empty set (0.00 sec) mysql>
Created 05-10-2019 10:18 PM
Very bizarre how many hosts do you have in your cluster? It seems no host is registered with Ambari super weird.
MariaDB and MySQL are literally the same MariaDB is a spinoff from MySQL ever since it was bought by Oracle it's no longer free but MariaDB is !!!
As ambari user please run the below against the ambari database
Mysql> select ipv4 from hosts;
My fear is you decommissioned and removed all hosts. Can you access the Ambari UI if so can you share the visible hosts in Ambari?
Please revert
Created on 05-10-2019 10:30 PM - edited 08-17-2019 03:31 PM
Looks like the hosts table is empty. Also, I can see all hosts and heartbeats.
mysql> show databases; +--------------------+ | Database | +--------------------+ | information_schema | | ambari | | hive | | mysql | | performance_schema | | sys | +--------------------+ 6 rows in set (0.00 sec) mysql> select ipv4 from hosts; Empty set (0.00 sec) mysql> show tables; +-------------------------------+ | Tables_in_ambari | +-------------------------------+ | ClusterHostMapping | | QRTZ_BLOB_TRIGGERS | | QRTZ_CALENDARS | | QRTZ_CRON_TRIGGERS | | QRTZ_FIRED_TRIGGERS | | QRTZ_JOB_DETAILS | | QRTZ_LOCKS | | QRTZ_PAUSED_TRIGGER_GRPS | | QRTZ_SCHEDULER_STATE | | QRTZ_SIMPLE_TRIGGERS | | QRTZ_SIMPROP_TRIGGERS | | QRTZ_TRIGGERS | | adminpermission | | adminprincipal | | adminprincipaltype | | adminprivilege | | adminresource | | adminresourcetype | | alert_current | | alert_definition | | alert_group | | alert_group_target | | alert_grouping | | alert_history | | alert_notice | | alert_target | | alert_target_states | | ambari_operation_history | | ambari_sequences | | artifact | | blueprint | | blueprint_configuration | | blueprint_setting | | clusterconfig | | clusters | | clusterservices | | clusterstate | | confgroupclusterconfigmapping | | configgroup | | configgrouphostmapping | | execution_command | | extension | | extensionlink | | groups | | host_role_command | | host_version | | hostcomponentdesiredstate | | hostcomponentstate | | hostconfigmapping | | hostgroup | | hostgroup_component | | hostgroup_configuration | | hosts | | hoststate | | kerberos_descriptor | | kerberos_principal | | kerberos_principal_host | | key_value_store | | members | | metainfo | | permission_roleauthorization | | remoteambaricluster | | remoteambariclusterservice | | repo_version | | request | | requestoperationlevel | | requestresourcefilter | | requestschedule | | requestschedulebatchrequest | | role_success_criteria | | roleauthorization | | servicecomponent_version | | servicecomponentdesiredstate | | serviceconfig | | serviceconfighosts | | serviceconfigmapping | | servicedesiredstate | | setting | | stack | | stage | | topology_host_info | | topology_host_request | | topology_host_task | | topology_hostgroup | | topology_logical_request | | topology_logical_task | | topology_request | | upgrade | | upgrade_group | | upgrade_history | | upgrade_item | | users | | viewentity | | viewinstance | | viewinstancedata | | viewinstanceproperty | | viewmain | | viewparameter | | viewresource | | viewurl | | widget | | widget_layout | | widget_layout_user_widget | +-------------------------------+ 103 rows in set (0.01 sec) mysql> select * from hosts; Empty set (0.00 sec) mysql>
Created 05-10-2019 11:00 PM
You have a 9 node cluster and none of them are registered with ambari that's very bizarre! Can you explain exactly what you did?
" I decommissioned few Data Nodes" how many nodes did you decommission are they visible on the screenshot you attached?
Can you ping all those nodes to validate that those nodes are up? Can you ssh to all and validate the /etc/hosts has the valid entries and IP hoping all are fixed IP's?
Copy and paste the ambari.agent.ini here
Can you check with this API.assuming your ambari user & password is admin, please replace <ambari-server:port> correct value for your ambari.server and port usually 8080
curl -i -H "X-Requested-By: ambari" -u admin:admin -X GET http://<ambari-server:port>/api/v1/hosts
sample output
{ "href" : "http://osaka:8080/api/v1/hosts", "items" : [ { "href" : "http://osaka:8080/api/v1/hosts/osaka.com", "Hosts" : { "cluster_name" : "osaka", "host_name" : "osaka.com" } }
This should confirm which hosts are registered,
Created 05-10-2019 11:18 PM
This is what I had done.
I had setup the cluster with 8 DataNodes and tested fine. I then decommissioned 4 DataNodes. The smaller cluster with 1 NameNode/4 Data Node works fine. I then brought back the 4 decommissioned DataNodes through Ambari WebUI "Recommission" command. The cluster works fine for few days until suddenly, it ran into this problem. Here is the output from curl
root@msl-dpe-perf88:/home/harry.li# curl -i -H "X-Requested-By: ambari" -u admin:admin -X GET http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts HTTP/1.1 200 OK X-Frame-Options: DENY X-XSS-Protection: 1; mode=block X-Content-Type-Options: nosniff Cache-Control: no-store Pragma: no-cache Set-Cookie: AMBARISESSIONID=onn7abudz0gc1fzd6hw0wp9nj;Path=/;HttpOnly Expires: Thu, 01 Jan 1970 00:00:00 GMT User: admin Content-Type: text/plain Vary: Accept-Encoding, User-Agent Content-Length: 1940 { "href" : "http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts", "items" : [ { "href" : "http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts/msl-dpe-d10.msl.lab", "Hosts" : { "cluster_name" : "HW8N", "host_name" : "msl-dpe-d10.msl.lab" } }, { "href" : "http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts/msl-dpe-d9.msl.lab", "Hosts" : { "cluster_name" : "HW8N", "host_name" : "msl-dpe-d9.msl.lab" } }, { "href" : "http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts/msl-dpe-perf82.msl.lab", "Hosts" : { "cluster_name" : "HW8N", "host_name" : "msl-dpe-perf82.msl.lab" } }, { "href" : "http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts/msl-dpe-perf83.msl.lab", "Hosts" : { "cluster_name" : "HW8N", "host_name" : "msl-dpe-perf83.msl.lab" } }, { "href" : "http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts/msl-dpe-perf84.msl.lab", "Hosts" : { "cluster_name" : "HW8N", "host_name" : "msl-dpe-perf84.msl.lab" } }, { "href" : "http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts/msl-dpe-perf85.msl.lab", "Hosts" : { "cluster_name" : "HW8N", "host_name" : "msl-dpe-perf85.msl.lab" } }, { "href" : "http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts/msl-dpe-perf86.msl.lab", "Hosts" : { "cluster_name" : "HW8N", "host_name" : "msl-dpe-perf86.msl.lab" } }, { "href" : "http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts/msl-dpe-perf87.msl.lab", "Hosts" : { "cluster_name" : "HW8N", "host_name" : "msl-dpe-perf87.msl.lab" } }, { "href" : "http://msl-dpe-perf88.msl.lab:8080/api/v1/hosts/msl-dpe-perf88.msl.lab", "Hosts" : { "cluster_name" : "HW8N", "host_name" : "msl-dpe-perf88.msl.lab" } } ] } root@msl-dpe-perf88:/home/harry.li#
Created on 05-11-2019 12:32 AM - edited 08-17-2019 03:31 PM
This is when I try to start all services from Ambari. The status from Name Node show nothing started. How can I track where it got stuck?