Support Questions

prasad4u_com · ‎03-01-2016

I decommissioned 4 data nodes and node managers out of 8 data nodes and node.managers. I checked dfs.exclude fie, it contains the decommissioned nodes host names. I restarted the Namenode. Still dashboard is showing 8 data nodes live and 4 nodemanagers live. Why it is not effecting data nodes part?

prasad4u_com · ‎03-02-2016

Step 1 : Decommission Nodemanagers from the cluster

Command :

curl -u admin:password -i -H 'X-Requested-By: ambari'-X POST -d '{ "RequestInfo":{ "context":"Decommission NodeManagers", "command":"DECOMMISSION", "parameters":{ "slave_type":"NODEMANAGER", "excluded_hosts":"serf010ext.etops.tllsc.net,serf020ext.etops.tllsc.net,villein010ext.etops.tllsc.net,villein020ext.etops.tllsc.net" }, "operation_level":{ "level":"HOST_COMPONENT", "cluster_name":"Name of the cluster" } }, "Requests/resource_filters":[ { "service_name":"YARN", "component_name":"RESOURCEMANAGER" } ]}' http://ambari_hostname:8080/api/v1/clusters/cluster name/requests

Step 2 : Decommission DataNodes from cluster

Command :

curl -u admin:password -i -H 'X-Requested-By: ambari'-X POST -d '{ "RequestInfo":{ "context":"Decommission DataNodes", "command":"DECOMMISSION", "parameters":{ "slave_type":"DATANODE", "excluded_hosts":"serf010ext.etops.tllsc.net,
serf020ext.etops.tllsc.net, villein010ext.etops.tllsc.net,
villein020ext.etops.tllsc.net" }, "operation_level":{ "level":"HOST_COMPONENT", "cluster_name":"Name of the cluster" } }, "Requests/resource_filters":[ { "service_name":"HDFS", "component_name":"NAMENODE" } ]}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/requests

Step 3 : Stop the Datanode service on each node of decommissioned nodes

Command:

curl -u admin:password -i -H
'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state":
"INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/serf010ext.etops.tllsc.net/host_compo...

Command:

curl -u admin:password -i -H
'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state":
"INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/serf020ext.etops.tllsc.net/host_compo...

Command:

curl -u admin:password -i -H
'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state":
"INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/villein010ext.etops.tllsc.net/host_co...

Command:

curl -u admin:password -i -H
'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state":
"INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/villein020ext.etops.tllsc.net/host_co...

Step 4 : Check under replicated and corrupted blocks on Ambari dashboard. It will show some number.

Step 4 : Restart Standby Namenode

Step 5 : Restart Active Namenode

Step 6 : Check under replicated and corrupted blocks on Ambari dashboard, they should be zero. By restarting Namenodes, it will distribute the blocks on live Data nodes only.

Here serf010ext,serf020ext,villein010ext and villein020ext are the nodes, which are planning to decommission from the cluster.

Thank you.

View solution in original post

smohanty · ‎03-01-2016

I believe decommissioned DataNodes are not stopped automatically and they need to be stopped explicitly through API calls. OOTH, decommissioned NodeManagers go down automatically.

prasad4u_com · ‎03-01-2016

How can we stop it then? If you have any example or link, can you please post here?

aervits · ‎03-01-2016

@Ram D https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=41812517

prasad4u_com · ‎03-01-2016

@Artem Ervits Once those are excluded and decommissioned, will not be in STARTED state. How can we stop from your given link. Can you please tell me the procedure to follow while decommissioning?

aervits · ‎03-01-2016

@Ram D I believe you stop the service first and then decommission. That's the way it's done in Ambari. Please refer to this https://docs.hortonworks.com/HDPDocuments/Ambari-2.1.2.1/bk_Ambari_Users_Guide/content/_deleting_a_h...

prasad4u_com · ‎03-02-2016

I tried to stop the DATANODE and NODEMANAGER service first and then tried to decommission the nodes. I am unable to decommission the nodes even it is not showing decommission the nodes got some internal exception. Then i decommissioned NODEMANAGER and DATANODE respectively using curl commands, then changed DATANODE service to INSTALLED state. Restarted Namenodes to get update of DATANODES live status. It was updated successfully. Before namenode start, i am able to see the corrupted blocks and under replicated blocks. After restart of namenodes, they went to zero. In ambari dash board, i am able to see four nodes live now.

aervits · ‎03-02-2016

it would be nice if you documented the whole procedure and provided it as a solution. @Ram D

prasad4u_com · ‎03-02-2016

Step 1 : Decommission Nodemanagers from the cluster

Command :

curl -u admin:password -i -H 'X-Requested-By: ambari'-X POST -d '{ "RequestInfo":{ "context":"Decommission NodeManagers", "command":"DECOMMISSION", "parameters":{ "slave_type":"NODEMANAGER", "excluded_hosts":"serf010ext.etops.tllsc.net,serf020ext.etops.tllsc.net,villein010ext.etops.tllsc.net,villein020ext.etops.tllsc.net" }, "operation_level":{ "level":"HOST_COMPONENT", "cluster_name":"Name of the cluster" } }, "Requests/resource_filters":[ { "service_name":"YARN", "component_name":"RESOURCEMANAGER" } ]}' http://ambari_hostname:8080/api/v1/clusters/cluster name/requests

Step 2 : Decommission DataNodes from cluster

Command :

curl -u admin:password -i -H 'X-Requested-By: ambari'-X POST -d '{ "RequestInfo":{ "context":"Decommission DataNodes", "command":"DECOMMISSION", "parameters":{ "slave_type":"DATANODE", "excluded_hosts":"serf010ext.etops.tllsc.net,
serf020ext.etops.tllsc.net, villein010ext.etops.tllsc.net,
villein020ext.etops.tllsc.net" }, "operation_level":{ "level":"HOST_COMPONENT", "cluster_name":"Name of the cluster" } }, "Requests/resource_filters":[ { "service_name":"HDFS", "component_name":"NAMENODE" } ]}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/requests

Step 3 : Stop the Datanode service on each node of decommissioned nodes

Command:

curl -u admin:password -i -H
'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state":
"INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/serf010ext.etops.tllsc.net/host_compo...

Command:

curl -u admin:password -i -H
'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state":
"INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/serf020ext.etops.tllsc.net/host_compo...

Command:

curl -u admin:password -i -H
'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state":
"INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/villein010ext.etops.tllsc.net/host_co...

Command:

curl -u admin:password -i -H
'X-Requested-By: ambari' -X PUT -d '{"HostRoles": {"state":
"INSTALLED"}}' http://ambari_hostname:8080/api/v1/clusters/cluster_name/hosts/villein020ext.etops.tllsc.net/host_co...

Step 4 : Check under replicated and corrupted blocks on Ambari dashboard. It will show some number.

Step 4 : Restart Standby Namenode

Step 5 : Restart Active Namenode

Step 6 : Check under replicated and corrupted blocks on Ambari dashboard, they should be zero. By restarting Namenodes, it will distribute the blocks on live Data nodes only.

Here serf010ext,serf020ext,villein010ext and villein020ext are the nodes, which are planning to decommission from the cluster.

Thank you.

klin · ‎03-23-2017

Hi all,

I am wondering if there is a reliable way to tell the completion of a NodeManager decommission?

For the DataNode decommission, I can do so by checking NameNode's log for completion. But it seems that there is no clear message from ResourceManager's log.

Cheers,

Cloudera Community

Support Questions

I tried to decommission the NODEMANAGERS and DATANODES through REST API. I got succeeded with that. I am unable to see decommissioned nodes in ambari dashboard.It is showing all the data nodes are live. What is the reason?