Member since
10-19-2015
52
Posts
3
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2536 | 09-28-2016 03:27 PM |
02-15-2018
09:34 AM
Hello, This python script helps to remove the hosts from the cluster. The following are the steps: 1. stop and decommission all roles in a host 2. remove the roles from a host identify and delete the roles one by one 3. remove host from a cluster 4. remove host from cloudera manager This script removes the hosts from the cloudera managed cluster running in aws. It is intend to scale down the worker node(node manager role) and gateway role from the cluster once the demand is over. You can change the script accordingly based on your environment. #!/bin/python
import httplib2
import os
import requests
import json
import boto3
import time
from requests.auth import HTTPBasicAuth
os.environ["AWS_ACCESS_KEY_ID"] = "ACCESS_KEY"
os.environ["AWS_SECRET_ACCESS_KEY"] = "SECRET_ACCESS_KEY"
os.environ["AWS_DEFAULT_REGION"] = "us-east-1"
region='us-east-1'
metadata = requests.get(url='http://169.254.169.254/latest/meta-data/instance-id')
instance_id = metadata.text
host = requests.get(url='http://169.254.169.254/latest/meta-data/hostname')
host_id = host.text
username='admin'
password='admin'
cluster_name='cluster001'
scm_protocol='http'
scm_host='host.compute-1.amazonaws.com'
scm_port='7180'
scm_api='v17'
client = boto3.client('autoscaling')
ec2 = boto3.client('autoscaling', region_name=region)
response = client.describe_auto_scaling_instances(InstanceIds=[instance_id,])
state = response['AutoScalingInstances'][0]['LifecycleState']
print "vm is in " + state
if state == 'Terminating:Wait':
print "host decommision started"
##decommission host
service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/cm/commands/hostsDecommission'
#service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/cm/hostsRecommission'
#service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/cm/commands/hostsStartRoles'
print service_url
headers = {'content-type': 'application/json'}
req_body = { "items":[ host_id ]}
print req_body
req = requests.post(url=service_url, auth=HTTPBasicAuth(username, password), data=json.dumps(req_body), headers=headers)
print req.text
time.sleep(120)
##delete roles in a host
api_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/hosts/' + host_id
req = requests.get(api_url, auth=HTTPBasicAuth(username, password))
a = json.loads(req.content)
for i in a['roleRefs']:
scm_uri='/api/' + scm_api + '/clusters/' + cluster_name + '/services/'+i['serviceName']+'/roles/'+i['roleName']
scm_url = scm_protocol + '://' + scm_host + ':' + scm_port + scm_uri
print scm_url
req = requests.delete(scm_url, auth=HTTPBasicAuth(username, password))
print req.text
time.sleep(10)
##remove host from cluster
service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/clusters/' + cluster_name + '/hosts/' + host_id
print service_url
req = requests.delete(service_url, auth=HTTPBasicAuth(username, password))
time.sleep(10)
##remove host from cloudera manager
os.system("/etc/init.d/cloudera-scm-agent stop")
service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/hosts/' + host_id
print service_url
req = requests.delete(service_url, auth=HTTPBasicAuth(username, password))
print req.text
time.sleep(10)
##refresh cluster configuration
service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/clusters/' + 'commands/refresh'
print service_url
req = requests.post(service_url, auth=HTTPBasicAuth(username, password))
print req.text
time.sleep(10)
##deploy client configuration
service_url = scm_protocol + '://' + scm_host + ':' + scm_port + '/api/' + scm_api + '/clusters/' + 'commands/deployClientConfig'
print service_url
req = requests.post(service_url, auth=HTTPBasicAuth(username, password))
print req.text
time.sleep(10) Best Regards, Radhakrishnan Rk
... View more
07-07-2017
07:40 AM
Hi, I am seeing a similar kind of issue on my namenode servers. When I tried to restart my namenode, it failed initially with permission issues on /data1/dfs/nn. The group identifier didn't exist in the list of groups I had. So, I changed the permissions on the /data1/dfs/nn and restarted it. This fixed the issue with /data1/dfs/nn but later starting throwing similar kind of errors with /data2/dfs/nn. I repeated the same steps on /data2/dfs/nn hoping that this will fix the issue.But no luck. For the same name node, why is it different for the data dirs. Any pointers ?
... View more
11-21-2016
02:53 PM
1 Kudo
Hi keagles, It can be still useful to do manual failover in some cases. For example, you would like to do some HW/OS maintenance on the active datanode - you can fail over manually to the other NN without distrupting running processes on the cluster. Also, you can do configuration changes to the NN in the same way. Change config, then restart the Standby NN, it will start with the new configuration, fail over, then restart the other one. See, you updated your NN's settings without the need to do a full cluster stop. These are not possible with a non-HA configuration. cheers, zegab
... View more
10-31-2016
02:25 PM
Thanks the wait() worked. But I was wondering what action has wait method. Now i now service start, decommssion etc has wait method but role start stop etc doesn't have wait method. also, is there a timeout function to the wait()
... View more
10-14-2016
08:23 AM
I've checked our internal codebase, and it would appear to be MIA for a while. Happy to report that it's available in our latest v12 (v5.7 [0]) python API [1] . [0] http://cloudera.github.io/cm_api/docs/releases/ [1] https://github.com/cloudera/cm_api/blob/cm5-5.7.0/python/src/cm_api/endpoints/clusters.py#L624-L631
... View more
10-12-2016
04:28 PM
The latter. Active NN will write every edit to all available journal nodes. A majority is confirm is required before the transaction is committed - that is, 2 out of your 3 journal nodes will have to have accepted the write. In the case of the standby NN reading from the journal nodes, it will again look for a majority and discard the discrepancy. I think if one out of three JN are down, and the final two don't agree, it will accept the most up to date JN's answer. But I could be completely wrong on this.
... View more
09-28-2016
03:27 PM
I found the answer, I should use cluster.add_hosts(hostid[]) I hope cloudera can doument this funciton call here: https://cloudera.github.io/cm_api/epydoc/5.8.0/cm_api-module.html
... View more
09-27-2016
01:49 PM
NP and thanks! I was also confused at the first place, when adding a new host on CM UI, there is one step which parcels being downloaded, distributed and activated on all hosts. That made me think managing parcel is a required step when adding new hosts in API.
... View more