Member since
07-30-2013
509
Posts
113
Kudos Received
124
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1511 | 07-09-2018 11:54 AM | |
1414 | 05-03-2017 11:03 AM | |
3195 | 03-28-2017 02:27 PM | |
1092 | 03-27-2017 03:17 PM | |
1021 | 03-13-2017 04:30 PM |
05-20-2019
01:16 PM
Sentry HA is supported as of CM / CDH 5.13.0 https://www.cloudera.com/documentation/enterprise/latest/topics/sg_sentry_ha.html To enable Sentry HA via the API, after meeting the prerequisites, you will need to: 1) Add an extra Sentry Server role to a host 2) You may need to go to the Hive service and Enable Stored Notifications in Database (HMS group setting) 3) Generally follow the public doc instructions for enabling Sentry HA with rolling restart, but using API commands instead of GUI actions.
... View more
07-09-2018
11:54 AM
1 Kudo
Hi, When creating your cluster, Cloudera Manager should automatically detect the directories on each host, then use Role Configuration Groups to set distinct configurations for the 10-disk nodes and the 20-disk nodes, and divide roles appropriately between those groups. dfs.data.dir isn't global, but is a role config, so it is usually set in the Role Config Group for a role. You can read more about configuration management here: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_primer.html#concept_fgj_tny_jk When you add new datanodes, I suggest creating a host template and applying that to your new nodes, allowing them to easily join the correct DataNode group as well as any other roles you may be running on that node (like a YARN NodeManager). You can read about host templates here: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_host_templates.html Thanks, Darren
... View more
04-02-2018
02:33 PM
Hi Venkat, This is a known issue fixed in upcoming releases 5.14.2, and 5.15+. It is reported in versions as old as CM 5.12.1, though I'm not sure if that's the oldest affected version, but this would explain the issue you're seeing. Thanks, Darren
... View more
02-06-2018
11:16 AM
I'm not sure why that didn't work, sorry. Hopefully a YARN / Hadoop expert can chime in.
... View more
01-18-2018
04:30 PM
1 Kudo
Looks like the YARN ResourceManager WebUI will tell you the racks it sees if click on Nodes. You could also temporarily modify your script to touch /tmp/didmyscriptrun and make sure that timestamp gets updated at some point.
... View more
01-08-2018
11:01 AM
1 Kudo
Hi, Normally, CM leverages the user-configured rack in the CM UI to populate the topology scripts for the cluster. If you override the topology script, hadoop should use your custom one, but CM does not invoke that script when deciding what rack to display in the UI, so CM will show something different from what your cluster is actually using. The topology.map is not used if you've customized the topology script to do something else. -Darren
... View more
01-02-2018
04:00 PM
It's also worth noting that CM made some defensive changes for this issue in newer versions, so using CM of at least one of these versions may help: 5.10+ 5.9.1+ 5.8.4+ 5.7.5+ I'd be extremely curious if someone could provide consistent reproduction steps for this as well. It seems to happen quite randomly.
... View more
12-20-2017
11:01 AM
Hi, Does "zeppelin-daemon.sh start" start zeppelin as a foreground process, or does it fork? You need to start the daemon as a foreground process so that the PID that CM is monitoring is the same as the PID of the daemon. If it forks, then CM is monitoring just the shell script, sees that the shell script exits shortly after starting, thinks that is the daemon crashing out, and retries starting it up to 2 more times.
... View more
10-27-2017
04:46 PM
Was your cluster using Packages or Parcels for CDH? You can't mix packages and parcels, so this issue could be due to adding an Anaconda Parcel to a Package-based CDH install.
... View more
06-22-2017
02:31 PM
We often default this to false, because we want admins to see failures rather than potentially mask them by doing an auto-restart. You can create a lot of junk if the process is auto-restarting frequently, potentially leading to issues like running out of disk space. If admins find that there's some known problem and they want the auto-restart workaround, then they can toggle it on. Can you elaborate more on why you'd like it on by default? Are there specific common known failures that can cause your roles to die, or is it just to improve availability?
... View more
05-03-2017
11:03 AM
1 Kudo
Hi, api.get_all_hosts() returns just basic information about a host by default, a SUMMARY view. You probably want the FULL view. See docs here: https://cloudera.github.io/cm_api/epydoc/5.11.0/cm_api.api_client.ApiResource-class.html#get_all_hosts If you do a simple HTTP GET on {CM_HOST:PORT}/api/v{version}/hosts you can easily see the kind of stuff returned by api.get_all_hosts(), and if you look at {CM_HOST:PORT}/api/v{version}/hosts?view=FULL you'll see you can get more details, like the role refs. For your use-case of replacing a failed node, there's significant trickiness in getting the steps just right. You may want to look into Cloudera Director, which can repair worker or gateway nodes, among many other features. Here's the doc page for repairing a node: https://www.cloudera.com/documentation/director/latest/topics/director_ui_cluster_shrink.html Thanks, Darren
... View more
03-28-2017
02:27 PM
Thanks for this report! This does indeed appear to be a bug (Paolo dug into it internally, credit to him) and we'll get a fix out in a future release. The abruptly stop step should skip when there's no started roles, rather than error. Thanks, Darren
... View more
03-27-2017
03:17 PM
1 Kudo
Hi Shant, That's not possible today. Why do you want that? Usually admins don't want to deal with so many certs. You can use additionalConfigs to emit parameters to more places, but be careful to read the caveat about passwords. Thanks, Darren
... View more
03-27-2017
09:59 AM
1 Kudo
1. You can't prevent the abrupt stop, but you shouldn't need to. Is it actually causing a problem? It may just be skipped. Can you show any error message, or post a screen shot? 2. No, that's not possible.
... View more
03-24-2017
06:21 PM
1 Kudo
Hi, Custom stop runners at the role level are planned for a future release. Stay tuned! Until then, the only ways to stop roles are: 1) Standard stop, included by default. CM will basically send a sigterm to your process, and if it doesn't die after 30s, it will send a group sigkill. You can stop individual roles this way (select what you want on the instances page, chose Actions for Selected -> Stop), but there's no reasonable way to run a custom stop script. 2) Service-level graceful stop. CM will run a custom script on a master role in your service, which must instruct the workers to exit normally (exit code 0), and once those have exited, CM will consider the service-level stop command successful. This is only helpful if your master role can orchestrate the stop, and it'll always stop all roles. Thanks, Darren
... View more
03-13-2017
04:30 PM
1 Kudo
Hi Shant, The documentation there is incomplete. Here's the information you're looking for: certificateLocationConfigName Optional. Config name to emit when ssl_server_certificate_location is used in a config file. If null, ssl_server_certificate_location will not be emitted into config files, and can only be used in substitutions like ${ssl_server_certificate_location}. certificateLocationDefault Optional. Default value for ssl_server_certificate_location. caCertificateLocationConfigName Optional. Config name to emit when ssl_server_ca_certificate_location is used in a config file. If null, ssl_server_ca_certificate_location will not be emitted into config files, and can only be used in substitutions like ${ssl_server_ca_certificate_location}. caCertificateLocationDefault Optional. Default value for ssl_server_ca_certificate_location. (sorry couldn't get the formatting nicer, forums doesn't seem to like width in the HTML) I'll get this added to the wiki in a future update.
... View more
03-02-2017
01:25 PM
Hi, It's best for systems (especially distributed systems) to not require careful ordering in startup. Instead, each process should wait for a bit for any dependency process (like the master) to come up. If possible, I also suggest that this wait period should be configurable, and at least 2 minutes in duration by default. There's no way for CSDs to control the ordering of start commands since we prefer robustness to ordering. Thanks, Darren
... View more
02-14-2017
04:16 PM
Depending on how you are running the job, you may be able to override the topology script parameter and/or replace the toplogy.py script with one that is python 3 compatible. If you're submitting jobs from the command line, you'd usually copy /etc/hadoop/conf to some custom directory /path/to/customized/conf, make changes there, then set HADOOP_CONF_DIR=/path/to/customized/conf and run your job. Assuming you can change that topology script, here's the relevant portion of the diff that you can apply: @@ -1,8 +1,8 @@
#!/usr/bin/env python
#
-# Copyright (c) 2010-2012 Cloudera, Inc. All rights reserved.
+# Copyright (c) 2016 Cloudera, Inc. All rights reserved.
#
-
+
'''
This script is provided by CMF for hadoop to determine network/rack topology.
It is automatically generated and could be replaced at any time. Any changes
@@ -12,8 +12,13 @@ made to it will be lost when this happens.
import os
import sys
import xml.dom.minidom
-from string import join
-
+
+try:
+ xrange
+except NameError:
+ # support for python3, which basically renamed xrange to range
+ xrange = range
+
def main():
MAP_FILE = '{{CMF_CONF_DIR}}/topology.map'
DEFAULT_RACK = '/default'
@@ -40,14 +45,14 @@ def main():
map[node.getAttribute("name")] = node.getAttribute("rack")
except:
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
- print default_rack
+ print(default_rack)
return -1
-
+
default_rack = "".join([ DEFAULT_RACK for _ in xrange(max_elements)])
if len(sys.argv)==1:
- print default_rack
+ print(default_rack)
else:
- print join([map.get(i, default_rack) for i in sys.argv[1:]], " ")
+ print(" ".join([map.get(i, default_rack) for i in sys.argv[1:]]))
return 0
if __name__ == "__main__":
... View more
01-19-2017
11:51 AM
Hi, You're correct. CM Server displays the files that are sent to the agent before shell scripts are run, so they'll show the placeholder values if you used my suggested approach. This is an unfortunate limitation. We don't have server-side support for this sort of advanced custom logic today. There's no way to update the UI as well. I'll mention again that it's best if the underlying product doesn't require multiple configs to be changed when there's logically one change to be made, though I know that's not always something a CSD author can control. Thanks, Darren
... View more
01-17-2017
01:59 PM
Hi, Ideally the underlying service would have a single toggle for Kerberos, rather than require the administrator / administrative tool know that several configs must be changed at the same time. If that's not possible, then you can implement this in the bash script. There's many approaches to doing this, but in the simplest case, you can essentially: 1) Have a single parameter to control kerberos, let's say it's a boolean "kerberos.enabled". 2) In the SDL, have the kerberos section reference this parameter "kerberos" : "${kerberos.enabled}", 3) Emit an environment variable "roles": [
{
"name": "EXAMPLE_ROLE",
// ...
"startRunner": {
// ...
"environmentVariables": {
"KERBEROS_ENABLED" : "${kerberos.enabled}",
// ... 4) In your control script, check the value of $KERBEROS_ENABLED, and then add configs to the XML accordingly. One nice way to accomplish the last step is to use additionalConfigs to create a placeholder value, then use perl to replace it. So your SDL would have: "roles": [
{
"name": "EXAMPLE_ROLE",
// ...
"configWriter": {
"generators": [
{
"filename": "example.xml",
// ...
"additionalConfigs" : [
{
"key" : "kerb.prop.a",
"value" : "{{KERB_PROP_A}}"
},
{
"key" : "kerb.prop.b",
"value" : "{{KERB_PROP_B}}"
},
// ... Then your control script would implement the conditional logic you need, then replace those variables: if [[ ${KERBEROS_ENABLED} == "true" ]]; then
KERB_PROP_A=1
KERB_PROP_B=2
else
KERB_PROP_A=100
KERB_PROP_B=121
fi
perl -pi -e "s#\#{{KERB_PROP_A}}#${KERB_PROP_A}#" "$CONF_DIR/example.xml"
perl -pi -e "s#\#{{KERB_PROP_B}}#${KERB_PROP_B}#" "$CONF_DIR/example.xml" Thanks, Darren
... View more
01-12-2017
04:44 PM
1 Kudo
Service advanced configuration snippets (safety valves) apply to daemons. The files you checked are configuration for clients. To make changes to those, use the Client advanced configuration snippet. Read the descriptions of the parameters carefully to see what is affected. CM-managed daemons use private process directories in /var/run/cloudera-scm-agent/process. You can read about it here: http://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_service_config_overview.html#cmug_topic_2_2
... View more
01-03-2017
10:25 AM
The balancer role just tells CM which host the balancer command should be executed from. Running it via API is a bit weird. You start the role to run the balancer.
... View more
12-21-2016
01:46 PM
If you have a role called "SecondaryNameNode", then that's incorrect. This is a very confusing name for the role in Hadoop. The SecondaryNameNode is only used in non-HA scenario. In HA, you have multiple (regular) NameNode roles defined. HDFS HA, when properly configured, will have a nameservice. There are many other steps though. The HDFS HA setup process is particularly complicated, so if you can return to a normal non-HA state, then get the wizard to work, it's much better. What issue did you hit with the Enable NameNode HA wizard? If you have a trial or enterprise license, you can use the config history page to help identify what changes you made since you had a normal, non-HA state, which can help you revert your changes.
... View more
12-21-2016
10:38 AM
1 Kudo
If you look at the Oozie config page, and search for load balancer, is that configured correctly? Did you set up HA for Oozie using the CM wizard? https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_hag_oozie_ha.html
... View more
12-20-2016
02:24 PM
Please ignore the permission error on supervisor.conf. The script that failed to update that file doesn't actually need to target that file. A future version of Cloudera Manager has updated the code to not log this spurrious error. You may also want to revert the permissions to supervisor.conf to not be world-readable. What does the end of the stderr log say? Did you check the role logs for your FC for a relevant error message?
... View more
09-27-2016
01:13 PM
Oh sorry, I misunderstood. When you add a host, CM will automatically distribute all parcels to that host and activate it so it matches all other hosts in the cluster.
... View more
09-27-2016
01:06 PM
Read about parcel lifecycles here: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_parcels.html#concept_k1c_wgx_pn All activated parcels are already downloaded and distributed.
... View more
09-27-2016
11:43 AM
CM will automatically de-activate conflicting parcels, which is why you didn't need to de-activate 5.5.1 first. It would be a burden on the user to identify and de-activate all conflicting parcels, and this is by design. The CM UI will show the upgrade option when you are trying to activate a CDH parcel of a higher version. This will take you through the upgrade wizard, which is generally what you want. In your case, you don't really want the upgrade wizard and can use the API to directly activate 5.5.1 again. If your API script should never be run when CDH is already installed, you can add a check to your script to validate that requirement. Rather than trying to test with a single host within a cluster, you can make a test cluster with a single host and try out your API script.
... View more
09-27-2016
11:20 AM
Hi Keagles, CM always distributes parcels to the entire cluster. It's generally much easier to manage your cluster when all hosts are running the same software, and the overhead of distributing and storing these binaries is generally very small compared to the real cluster activity and size of your real data. You can easily deactivate, undistribute, and delete parcels if you don't like the change you made. You can also test things out by just downloading the parcel via the API, then checking in the CM UI that you got the right one.
... View more