About keagles

rkkrishnaa · ‎02-15-2018

Hello, This python script helps to remove the hosts from the cluster. The following are the steps: 1. stop and decommission all roles in a host 2. remove the roles from a host identify and delete the roles one by one 3. remove host from a cluster 4. remove host from cloudera manager This script removes the hosts from the cloudera managed cluster running in aws. It is intend to scale down the worker node(node manager role) and gateway role from the cluster once the demand is over. You can change the script accordingly based on your environment. #!/bin/python import httplib2 import os import requests import json import boto3 import time from requests.auth import HTTPBasicAuth os.environ["AWS_ACCESS_KEY_ID"] = "ACCESS_KEY" os.environ["AWS_SECRET_ACCESS_KEY"] = "SECRET_ACCESS_KEY" os.environ["AWS_DEFAULT_REGION"] = "us-east-1" region='us-east-1' metadata = requests.get(url='http://169.254.169.254/latest/meta-data/instance-id') instance_id = metadata.text host = requests.get(url='http://169.254.169.254/latest/meta-data/hostname') host_id = host.text username='admin' password='admin' cluster_name='cluster001' scm_protocol='http' scm_host='host.compute-1.amazonaws.com' scm_port='7180' scm_api='v17' client = boto3.client('autoscaling') ec2 = boto3.client('autoscaling', region_name=region) response = client.describe_auto_scaling_instances(InstanceIds=[instance_id,]) state = response['AutoScalingInstances'][0]['LifecycleState'] print "vm is in " + state if state == 'Terminating:Wait': print "host decommision started" ##decommission host service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/cm/commands/hostsDecommission' #service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/cm/hostsRecommission' #service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/cm/commands/hostsStartRoles' print service_url headers = {'content-type': 'application/json'} req_body = { "items":[ host_id ]} print req_body req = requests.post(url=service_url, auth=HTTPBasicAuth(username, password), data=json.dumps(req_body), headers=headers) print req.text time.sleep(120) ##delete roles in a host api_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/hosts/' + host_id req = requests.get(api_url, auth=HTTPBasicAuth(username, password)) a = json.loads(req.content) for i in a['roleRefs']: scm_uri='/api/' + scm_api + '/clusters/' + cluster_name + '/services/'+i['serviceName']+'/roles/'+i['roleName'] scm_url = scm_protocol + '://' + scm_host + ':' + scm_port + scm_uri print scm_url req = requests.delete(scm_url, auth=HTTPBasicAuth(username, password)) print req.text time.sleep(10) ##remove host from cluster service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/clusters/' + cluster_name + '/hosts/' + host_id print service_url req = requests.delete(service_url, auth=HTTPBasicAuth(username, password)) time.sleep(10) ##remove host from cloudera manager os.system("/etc/init.d/cloudera-scm-agent stop") service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/hosts/' + host_id print service_url req = requests.delete(service_url, auth=HTTPBasicAuth(username, password)) print req.text time.sleep(10) ##refresh cluster configuration service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/clusters/' + 'commands/refresh' print service_url req = requests.post(service_url, auth=HTTPBasicAuth(username, password)) print req.text time.sleep(10) ##deploy client configuration service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/clusters/' + 'commands/deployClientConfig' print service_url req = requests.post(service_url, auth=HTTPBasicAuth(username, password)) print req.text time.sleep(10) Best Regards, Radhakrishnan Rk

chaitanyadeept · ‎07-07-2017

Hi, I am seeing a similar kind of issue on my namenode servers. When I tried to restart my namenode, it failed initially with permission issues on /data1/dfs/nn. The group identifier didn't exist in the list of groups I had. So, I changed the permissions on the /data1/dfs/nn and restarted it. This fixed the issue with /data1/dfs/nn but later starting throwing similar kind of errors with /data2/dfs/nn. I repeated the same steps on /data2/dfs/nn hoping that this will fix the issue.But no luck. For the same name node, why is it different for the data dirs. Any pointers ?

zegab · ‎11-21-2016

Hi keagles, It can be still useful to do manual failover in some cases. For example, you would like to do some HW/OS maintenance on the active datanode - you can fail over manually to the other NN without distrupting running processes on the cluster. Also, you can do configuration changes to the NN in the same way. Change config, then restart the Standby NN, it will start with the new configuration, fail over, then restart the other one. See, you updated your NN's settings without the need to do a full cluster stop. These are not possible with a non-HA configuration. cheers, zegab

keagles · ‎10-31-2016

Thanks the wait() worked. But I was wondering what action has wait method. Now i now service start, decommssion etc has wait method but role start stop etc doesn't have wait method. also, is there a timeout function to the wait()

michalis · ‎10-14-2016

I've checked our internal codebase, and it would appear to be MIA for a while. Happy to report that it's available in our latest v12 (v5.7 [0]) python API [1] . [0] http://cloudera.github.io/cm_api/docs/releases/ [1] https://github.com/cloudera/cm_api/blob/cm5-5.7.0/python/src/cm_api/endpoints/clusters.py#L624-L631

nt89 · ‎10-12-2016

The latter. Active NN will write every edit to all available journal nodes. A majority is confirm is required before the transaction is committed - that is, 2 out of your 3 journal nodes will have to have accepted the write. In the case of the standby NN reading from the journal nodes, it will again look for a majority and discard the discrepancy. I think if one out of three JN are down, and the final two don't agree, it will accept the most up to date JN's answer. But I could be completely wrong on this.

keagles · ‎09-28-2016

I found the answer, I should use cluster.add_hosts(hostid[]) I hope cloudera can doument this funciton call here: https://cloudera.github.io/cm_api/epydoc/5.8.0/cm_api-module.html

keagles · ‎09-27-2016

NP and thanks! I was also confused at the first place, when adding a new host on CM UI, there is one step which parcels being downloaded, distributed and activated on all hosts. That made me think managing parcel is a required step when adding new hosts in API.

Online	Offline
Last Visited	‎12-13-2016 05:05 PM

Member Since	‎10-19-2015 04:43 PM
Last Visited	‎12-13-2016 05:05 PM
Posts	52
Kudos received	3

Cloudera Community

Re: How do I add a new host to the cluster through...

Re: How to remove roles on a given host ID on clud...

Re: Failed to start namenode - Directory is in an ...

Re: What are the benefits If I enable Namenoe HA ...

Re: Is this a known bug in API V11?

Re: Was pools_refresh API call removed in later ve...

Re: How does active namenode write to journal node...

Re: How do I add a new host to the cluster through...

Re: How to use CM API to distribute parcel to a ho...