Created 03-01-2016 07:18 PM
I decommissioned 4 data nodes and node managers out of 8 data nodes and node.managers. I checked dfs.exclude fie, it contains the decommissioned nodes host names. I restarted the Namenode. Still dashboard is showing 8 data nodes live and 4 nodemanagers live. Why it is not effecting data nodes part?
Created 03-02-2016 07:04 PM
Step 1 : Decommission Nodemanagers from the cluster
Command :
curl -u admin:password -i -H 'X-Requested-By: ambari'-X POST -d '{ "RequestInfo":{ "context":"Decommission NodeManagers", "command":"DECOMMISSION", "parameters":{ "slave_type":"NODEMANAGER", "excluded_hosts":"serf010ext.etops.tllsc.net,serf020ext.etops.tllsc.net,villein010ext.etops.tllsc.net,villein020ext.etops.tllsc.net" }, "operation_level":{ "level":"HOST_COMPONENT", "cluster_name":"Name of the cluster" } }, "Requests/resource_filters":[ { "service_name":"YARN", "component_name":"RESOURCEMANAGER" } ]}' http://ambari_hostname:8080/api/v1/clusters/cluster name/requests
Step 2 : Decommission DataNodes from cluster
Command :
curl -u admin:password -i -H 'X-Requested-By: ambari'-X POST -d '{ "RequestInfo":{ "context":"Decommission DataNodes", "command":"DECOMMISSION", "parameters":{ "slave_type":"DATANODE", "excluded_hosts":"serf010ext.etops.tllsc.net, serf020ext.etops.tllsc.net, villein010ext.etops.tllsc.net, villein020ext.etops.tllsc.net" }, "operation_level":{ "level":"HOST_COMPONENT", "cluster_name":"Name of the cluster" } }, "Requests/resource_filters":[ { "service_name":"HDFS", "component_name":"NAMENODE" } ]}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/requests
Step 3 : Stop the Datanode service on each node of decommissioned nodes
Command:
curl -u admin:password -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/serf010ext.etops.tllsc.net/host_compo...
Command:
curl -u admin:password -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/serf020ext.etops.tllsc.net/host_compo...
Command:
curl -u admin:password -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/villein010ext.etops.tllsc.net/host_co...
Command:
curl -u admin:password -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/villein020ext.etops.tllsc.net/host_co...
Step 4 : Check under replicated and corrupted blocks on Ambari dashboard. It will show some number.
Step 4 : Restart Standby Namenode
Step 5 : Restart Active Namenode
Step 6 : Check under replicated and corrupted blocks on Ambari dashboard, they should be zero. By restarting Namenodes, it will distribute the blocks on live Data nodes only.
Here serf010ext,serf020ext,villein010ext and villein020ext are the nodes, which are planning to decommission from the cluster.
Thank you.
Created 03-01-2016 07:22 PM
I believe decommissioned DataNodes are not stopped automatically and they need to be stopped explicitly through API calls. OOTH, decommissioned NodeManagers go down automatically.
Created 03-01-2016 07:24 PM
How can we stop it then? If you have any example or link, can you please post here?
Created 03-01-2016 07:26 PM
Created 03-01-2016 07:49 PM
@Artem Ervits Once those are excluded and decommissioned, will not be in STARTED state. How can we stop from your given link. Can you please tell me the procedure to follow while decommissioning?
Created 03-01-2016 07:52 PM
@Ram D I believe you stop the service first and then decommission. That's the way it's done in Ambari. Please refer to this https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Ambari_Users_Guide/content/_deleting_a_h...
Created 03-02-2016 06:49 PM
I tried to stop the DATANODE and NODEMANAGER service first and then tried to decommission the nodes. I am unable to decommission the nodes even it is not showing decommission the nodes got some internal exception. Then i decommissioned NODEMANAGER and DATANODE respectively using curl commands, then changed DATANODE service to INSTALLED state. Restarted Namenodes to get update of DATANODES live status. It was updated successfully. Before namenode start, i am able to see the corrupted blocks and under replicated blocks. After restart of namenodes, they went to zero. In ambari dash board, i am able to see four nodes live now.
Created 03-02-2016 06:54 PM
it would be nice if you documented the whole procedure and provided it as a solution. @Ram D
Created 03-02-2016 07:04 PM
Step 1 : Decommission Nodemanagers from the cluster
Command :
curl -u admin:password -i -H 'X-Requested-By: ambari'-X POST -d '{ "RequestInfo":{ "context":"Decommission NodeManagers", "command":"DECOMMISSION", "parameters":{ "slave_type":"NODEMANAGER", "excluded_hosts":"serf010ext.etops.tllsc.net,serf020ext.etops.tllsc.net,villein010ext.etops.tllsc.net,villein020ext.etops.tllsc.net" }, "operation_level":{ "level":"HOST_COMPONENT", "cluster_name":"Name of the cluster" } }, "Requests/resource_filters":[ { "service_name":"YARN", "component_name":"RESOURCEMANAGER" } ]}' http://ambari_hostname:8080/api/v1/clusters/cluster name/requests
Step 2 : Decommission DataNodes from cluster
Command :
curl -u admin:password -i -H 'X-Requested-By: ambari'-X POST -d '{ "RequestInfo":{ "context":"Decommission DataNodes", "command":"DECOMMISSION", "parameters":{ "slave_type":"DATANODE", "excluded_hosts":"serf010ext.etops.tllsc.net, serf020ext.etops.tllsc.net, villein010ext.etops.tllsc.net, villein020ext.etops.tllsc.net" }, "operation_level":{ "level":"HOST_COMPONENT", "cluster_name":"Name of the cluster" } }, "Requests/resource_filters":[ { "service_name":"HDFS", "component_name":"NAMENODE" } ]}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/requests
Step 3 : Stop the Datanode service on each node of decommissioned nodes
Command:
curl -u admin:password -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/serf010ext.etops.tllsc.net/host_compo...
Command:
curl -u admin:password -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/serf020ext.etops.tllsc.net/host_compo...
Command:
curl -u admin:password -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/villein010ext.etops.tllsc.net/host_co...
Command:
curl -u admin:password -i -H 'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state": "INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/villein020ext.etops.tllsc.net/host_co...
Step 4 : Check under replicated and corrupted blocks on Ambari dashboard. It will show some number.
Step 4 : Restart Standby Namenode
Step 5 : Restart Active Namenode
Step 6 : Check under replicated and corrupted blocks on Ambari dashboard, they should be zero. By restarting Namenodes, it will distribute the blocks on live Data nodes only.
Here serf010ext,serf020ext,villein010ext and villein020ext are the nodes, which are planning to decommission from the cluster.
Thank you.
Created 03-23-2017 07:26 AM
Hi all,
I am wondering if there is a reliable way to tell the completion of a NodeManager decommission?
For the DataNode decommission, I can do so by checking NameNode's log for completion. But it seems that there is no clear message from ResourceManager's log.
Cheers,