About Darren

Darren · ‎07-09-2018

Hi, When creating your cluster, Cloudera Manager should automatically detect the directories on each host, then use Role Configuration Groups to set distinct configurations for the 10-disk nodes and the 20-disk nodes, and divide roles appropriately between those groups. dfs.data.dir isn't global, but is a role config, so it is usually set in the Role Config Group for a role. You can read more about configuration management here: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_primer.html#concept_fgj_tny_jk When you add new datanodes, I suggest creating a host template and applying that to your new nodes, allowing them to easily join the correct DataNode group as well as any other roles you may be running on that node (like a YARN NodeManager). You can read about host templates here: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_host_templates.html Thanks, Darren

Darren · ‎05-03-2017

Hi, api.get_all_hosts() returns just basic information about a host by default, a SUMMARY view. You probably want the FULL view. See docs here: https://cloudera.github.io/cm_api/epydoc/5.11.0/cm_api.api_client.ApiResource-class.html#get_all_hosts If you do a simple HTTP GET on {CM_HOST:PORT}/api/v{version}/hosts you can easily see the kind of stuff returned by api.get_all_hosts(), and if you look at {CM_HOST:PORT}/api/v{version}/hosts?view=FULL you'll see you can get more details, like the role refs. For your use-case of replacing a failed node, there's significant trickiness in getting the steps just right. You may want to look into Cloudera Director, which can repair worker or gateway nodes, among many other features. Here's the doc page for repairing a node: https://www.cloudera.com/documentation/director/latest/topics/director_ui_cluster_shrink.html Thanks, Darren

Darren · ‎03-28-2017

Thanks for this report! This does indeed appear to be a bug (Paolo dug into it internally, credit to him) and we'll get a fix out in a future release. The abruptly stop step should skip when there's no started roles, rather than error. Thanks, Darren

Darren · ‎03-27-2017

Hi Shant, That's not possible today. Why do you want that? Usually admins don't want to deal with so many certs. You can use additionalConfigs to emit parameters to more places, but be careful to read the caveat about passwords. Thanks, Darren

Darren · ‎03-27-2017

1. You can't prevent the abrupt stop, but you shouldn't need to. Is it actually causing a problem? It may just be skipped. Can you show any error message, or post a screen shot? 2. No, that's not possible.

Darren · ‎03-24-2017

Hi, Custom stop runners at the role level are planned for a future release. Stay tuned! Until then, the only ways to stop roles are: 1) Standard stop, included by default. CM will basically send a sigterm to your process, and if it doesn't die after 30s, it will send a group sigkill. You can stop individual roles this way (select what you want on the instances page, chose Actions for Selected -> Stop), but there's no reasonable way to run a custom stop script. 2) Service-level graceful stop. CM will run a custom script on a master role in your service, which must instruct the workers to exit normally (exit code 0), and once those have exited, CM will consider the service-level stop command successful. This is only helpful if your master role can orchestrate the stop, and it'll always stop all roles. Thanks, Darren

Darren · ‎03-13-2017

Hi Shant, The documentation there is incomplete. Here's the information you're looking for: certificateLocationConfigName Optional. Config name to emit when ssl_server_certificate_location is used in a config file. If null, ssl_server_certificate_location will not be emitted into config files, and can only be used in substitutions like ${ssl_server_certificate_location}. certificateLocationDefault Optional. Default value for ssl_server_certificate_location. caCertificateLocationConfigName Optional. Config name to emit when ssl_server_ca_certificate_location is used in a config file. If null, ssl_server_ca_certificate_location will not be emitted into config files, and can only be used in substitutions like ${ssl_server_ca_certificate_location}. caCertificateLocationDefault Optional. Default value for ssl_server_ca_certificate_location. (sorry couldn't get the formatting nicer, forums doesn't seem to like width in the HTML) I'll get this added to the wiki in a future update.

Darren · ‎03-02-2017

Hi, It's best for systems (especially distributed systems) to not require careful ordering in startup. Instead, each process should wait for a bit for any dependency process (like the master) to come up. If possible, I also suggest that this wait period should be configurable, and at least 2 minutes in duration by default. There's no way for CSDs to control the ordering of start commands since we prefer robustness to ordering. Thanks, Darren

Darren · ‎02-14-2017

Depending on how you are running the job, you may be able to override the topology script parameter and/or replace the toplogy.py script with one that is python 3 compatible. If you're submitting jobs from the command line, you'd usually copy /etc/hadoop/conf to some custom directory /path/to/customized/conf, make changes there, then set HADOOP_CONF_DIR=/path/to/customized/conf and run your job. Assuming you can change that topology script, here's the relevant portion of the diff that you can apply: @@ -1,8 +1,8 @@ #!/usr/bin/env python # -# Copyright (c) 2010-2012 Cloudera, Inc. All rights reserved. +# Copyright (c) 2016 Cloudera, Inc. All rights reserved. # - + ''' This script is provided by CMF for hadoop to determine network/rack topology. It is automatically generated and could be replaced at any time. Any changes @@ -12,8 +12,13 @@ made to it will be lost when this happens. import os import sys import xml.dom.minidom -from string import join - + +try: + xrange +except NameError: + # support for python3, which basically renamed xrange to range + xrange = range + def main(): MAP_FILE = '{{CMF_CONF_DIR}}/topology.map' DEFAULT_RACK = '/default' @@ -40,14 +45,14 @@ def main(): map[node.getAttribute("name")] = node.getAttribute("rack") except: default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)]) - print default_rack + print(default_rack) return -1 - + default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)]) if len(sys.argv)==1: - print default_rack + print(default_rack) else: - print join([map.get(i, default_rack) for i in sys.argv[1:]], " ") + print(" ".join([map.get(i, default_rack) for i in sys.argv[1:]])) return 0 if __name__ == "__main__":

Darren · ‎02-14-2017

This was fixed in CM 5.9.

Online	Offline
Last Visited	‎05-21-2019 01:27 PM

Member Since	‎07-30-2013 10:59 AM
Last Visited	‎05-21-2019 01:27 PM
Posts	509
Kudos received	112

Cloudera Community

Re: Install using CM of Datanodes with different n...

Re: Cloudera API doesn't return any roles for a ho...

Re: Stopping selected roles from Service action me...

Re: CSD Role with Multiple SSLServers

Re: CSD - SSLServer Paramaters

Re: Install using CM of Datanodes with different n...

Re: Cloudera API doesn't return any roles for a ho...

Re: Stopping selected roles from Service action me...

Re: CSD Role with Multiple SSLServers

Re: Stopping selected roles from Service action me...

Re: Stopping selected roles from Service action me...

Re: CSD - SSLServer Paramaters

Re: CSD Role Command dependency

Re: topology.py not Python 3 compatible

Re: topology.py not Python 3 compatible