Member since
10-14-2015
165
Posts
63
Kudos Received
27
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1310 | 12-11-2018 03:42 PM | |
1003 | 04-13-2018 09:17 PM | |
823 | 02-08-2018 06:34 PM | |
2111 | 01-24-2018 02:18 PM | |
3680 | 10-11-2017 07:27 PM |
05-02-2019
01:15 PM
The aggregate alerts take into account the status of individual instance alerts. The reason that the "Percent DataNodes Available" alert is tripping is because your decommissioned DataNode is being "managed" by Ambari. You either need to remove the DataNode or put that specific instance into Maintenance Mode.
... View more
02-11-2019
02:31 PM
Unless you are going to rebuild those services with a new version number and package them all together, what you are attempting to do won't work. All services from a repository must advertise the same version. The version select tool which is used (say hdp-select) must be able to interact will all of them, which means that it would also need to support your custom services. There's no way to make custom services upgradeable if they don't actually belong to that stack and advertise the same version. Take HDP and HDF, for example. When you install either, you'll get ZooKeeper. However, the ZooKeeper which comes with HDP is packaged differently than that which comes with HDF. The repositories used to install HDF will install ZooKeeper from a version which matches the HDF version. This allows the HDF stack to be managed as an upgradable stack since all of the components are built together. On the other hand, if you just drop an HDF service, like NIFI, into an HDP cluster, the NIFI service won't be upgradable since it's not truly a part of that stack. It won't broadcast its version, and it won't participate in upgrades.
... View more
02-11-2019
02:31 PM
1 Kudo
It looks like you are attempting to register a repository where the bits are actually 3.0.1.0-87, yet you expect the repository to report 3.0.1.0.1. That is not how stack inheritance works in Ambari. Yes, you can inherit a stack in order to get its configurations, alerts, scripts, etc. However, your new stack is going to be installed from a yum repository which it defines (not inherits). In this case, you are specifying the HDP-3.0.1.0 repository URLs for the RPMs and that means it will install 3.0.1.0-87. The version information of a package is used for managing a cluster with respect to upgrades. Unless you are actually rebuilding the binaries with a brand new version, then your 3.0.1.0.1 stack will never install. The wildcard 3_0_1_0_* also will never work here since it becomes ambiguous which repository you want to install from. So, even if you had rebuilt all of the binaries and provided correct repository URLs, a `yum install zookeeper_3_0_1_0_*` would produce random results.
... View more
12-13-2018
08:12 PM
In order to associate an `AlertTarget` with specific groups, you would use the `groups` property along with an array of IDs for the groups you care about: {
"AlertTarget": {
"name": "Administrators",
"description": "The Admins",
"notification_type": "EMAIL",
"groups": [1, 17, 23],
...
}
}
... View more
12-11-2018
03:42 PM
I agree with Akhil, we'd need some more information about what you are trying to do. However, his links are a great place to start. Also, it seems like you're trying to use the REST APIs directly. In which case, this link might also be of some help since it gives examples for using the alert groups and targets endpoints: https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/alert-dispatching.md
... View more
09-21-2018
04:25 PM
PUT api/v1/clusters/<clusterName>/alert_definitions/<id> {
"AlertDefinition/enabled": false
}
... View more
07-26-2018
12:59 PM
It's a timeout problem. The alert is giving the beeline command 60 seconds to spin up a JVM and connect to Hive. You can always go to the Alert's definition in the Ambari UI and change this timeout property to something higher (like 75 seconds). However, before you do that, you might want to run the command yourself and see how long it takes. If it's taking more than a minute, that could indicate a problem with resources on this host.
... View more
06-20-2018
09:04 PM
You could also be seeing the issue described here: https://community.hortonworks.com/questions/166963/upgrade-ambari-server-252-to-261.html Basically, if your cluster's state is not CURRENT, the upgrade won't know what repository to work with when adjusting foreign keys and whatnot.
... View more
04-16-2018
01:26 PM
Ah, sorry, try /var/log/hadoop-yarn/yarn
... View more
04-13-2018
09:37 PM
The disk usage alert runs every few minutes. If it hasn't cleared, then perhaps you didn't add enough storage. If you check the message, you can see why it thinks you don't have enough and you can verify your new mounts. The logs would be in /var/log/hadoop/yarn on the ResourceManager host.
... View more
04-13-2018
09:17 PM
Your disk usage alerts are still there because they are valid. They won't clear until you resolve the problem (adding more space to your hosts) or edit the alert and increase the threshold which triggers the alert. The ResourceManager alert also seems real since you can't login to it and since the UI indicates it's not running. I would check the RM logs on that host to see why it's having problems.
... View more
03-28-2018
09:09 PM
It depends on which service and component. Some only do simple checks like port and PID status. Hive Server actually runs beeline commands Hive Metastore runs a show databases command
... View more
03-21-2018
07:23 PM
Yes, you can - the above calls can capture it on a host/component basis.
... View more
03-21-2018
04:54 PM
The is the correct way to check for Maintenance Mode being enabled. Which version of Ambari are you running? When I try this locally, the correct value for `maintenance_state` is reflected. Also - you're putting the ZooKeeper service itself into MM, right? Putting individual hosts/components into MM won't reflect here.
... View more
02-08-2018
06:34 PM
The alerts.json files are only used to seed alert definitions initially. After an alert definition has been created in the system, modification of that alert must be done through the REST API. See: https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/alert-definitions.md#create You can delete the alert definition as well. This will cause the alerts.json to be read in again on Ambari Server restart.
... View more
01-24-2018
03:03 PM
Let me see if I can help you through this. Can you perform the following query for me: SELECT repo_version_id, version, display_name FROM repo_version ORDER BY version;
This will get you a list that looks something like: repo_version_id | version | display_name
-----------------+--------------+------------------
1 | 2.5.0.0-1237 | HDP-2.5.0.0-1237
101 | 2.5.4.0-121 | HDP-2.5.4.0-121
51 | 2.6.0.0-334 | HDP-2.6.0.0-334
Chances are the most recent version is the one that you're on (or are at least supposed to be on). In my case, this is ID 51. So, you would do: UPDATE cluster_version SET state = 'CURRENT' WHERE repo_version_id = 51;
The upgrade should work now after making this kind of change.
... View more
01-24-2018
02:18 PM
It seems like your cluster was not in a consistent state. Ambari needs to know what repository/version your cluster is "CURRENT" on. We track this via the cluster_version table. There will be a bunch of entries in here, one for each repository you've ever installed. It's OK for most of them to be OUT_OF_SYNC - that's expected. However, exactly 1 must be "CURRENT". I would find your correct repository version from the repo_version table and then update the corresponding row in the cluster_version table. Something like this: UPDATE cluster_version SET state = 'CURRENT' WHERE repo_version_id = <some-id>;
... View more
01-10-2018
10:00 PM
For REST APIs, you can check https://github.com/apache/ambari/blob/trunk/ambari-server/docs/api/v1/index.md
... View more
12-07-2017
02:58 PM
1 Kudo
Hi Mudassar, Generally, it's better to open a new issue instead of tacking onto an existing one since the problem/resolution could be very different. To answer your question, no, you can't clear it in this case. This is a metric alert coming from HDFS. The HDFS service is broadcasting that 1 DataNode is considered dead. Ambari is simply detecting this and alerting on it. You'll need to figure out why the NameNode is sending that metric. Normally I think the NN considers a DataNode "dead" after more than a few minutes of lost contact (without a decommission). However, if the DataNode makes contact again, it should be clearing it.
... View more
10-11-2017
07:27 PM
2 Kudos
You can clear out an individual alert's state by disabling it and then re-enabling it. This will cause all active instances of that alert to disappear and it will run clean. On some versions of Ambari, this was required when you did things like delete a host which could leave orphaned alerts which never run again (and thus become stale). If you are seeing the actual "Stale Alert" trigger, you'll want to identify which alerts are causing it to fire - in other words, which alerts are not running. Disabling/Enabling those could help - but if they seem to continue to be stale, then something else is going on which is preventing them from running.
... View more
10-05-2017
06:33 PM
You should wget the address specified in the error; the http://hostname:8042 address. The alerts run on their respective hosts. If the alert is for a NodeManager, then for every NodeManager in your system, each one will attempt to connect to it's own FQDN.
... View more
10-05-2017
05:14 PM
Let's take the first alert: Connection failed to http://hostname:8042 (<urlopen error timed out>) Remember that this is being run from the specific host to itself. So, in order to verify that things work, you'd need to first login to that host, and then run wget from that host to the FQDN in the error above. Also, check your environment for any possible proxy settings, like "export http_proxy"
... View more
10-04-2017
11:44 PM
Sometimes even though a service is running, there could be alerts which are triggering for it. Some examples include a service which is technically up but has a 500 error on its web management page. Or perhaps a metric alert is firing because of a value which is outside of the range of acceptable values. Can you please provide more context into which alerts are present? In some older versions of Ambari, alerts would remain for hosts/service which were removed. So if you're using an older version it could be that as well.
... View more
09-11-2017
06:12 PM
There are several alerts which cover heap. For example, if you wanted to see the raw DataNode heap value, then you'd check the metric alert definition for it: https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json#L1714-L1755 For NameNode, we don't alert on heap directly but we measure average heap deviation values: https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json#L868-L947 https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/alerts/alert_metrics_deviation.py
... View more
09-06-2017
11:40 PM
@Dinesh Chitlangia Glad you got it resolved. By any chance, did you determine which alert definition it was and what the invalid JSON was? We might want to file a bug if we can figure out how it happened.
... View more
09-06-2017
01:56 PM
1 Kudo
The JSON of the alert definition must have an invalid property set on it. Not sure how it happened, but there are a couple of options: Find the offending alert definition and correct its JSON in the database. Chances are it's the one after yarn_app_timeline_server_webui. Just remove all alert data and have Ambari bootstrap the definitions back in from the stack. This is a good option if you haven't added or modified any alert definitions from the ones that ship with the product.
... View more
07-17-2017
01:54 PM
Those tables have referential integrity and should not be able to get out-of-sync. Are you using MySQL as your Ambari database? If so, is your default engine InnoDB or MyISAM? If it's MyISAM, that's very bad as it doesn't support transactions or foreign keys. You would need perform several steps to convert your database into InnoDB or this could happen again in the future.
... View more
07-11-2017
05:22 PM
Sure - as I mentioned, an option that some people take is to have it run on all hosts, but to return the SKIPPED state if the script determines this host isn't of interest. Some ways to do this: - The alert can check for the existence of the process on the file system. If it's not there, it knows that this host shouldn't be included in the alerts. - You can add a property to any configuration, including cluster-env, which lists the hosts which should be checked. The scripts have access to all configurations, so they can see if the current host is in the list.
... View more
07-11-2017
03:37 PM
Currently, the alerts framework doesn't allow running scripts which are not Python-based. Therefore, you'd first need to convert your shell script into a Python file similar to alerts which ship with the product. Once you have a running Python script alert, then you need to figure out where to target it. It you want it running on every host, then you'd target it similar to how the agent host alerts work. It would be distributed to each host on your cluster. See: https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/alerts.json#L123-L164 If you want to target specific hosts, then you'd need to "install" a new service in Ambari which lets Ambari know where to distribute the alerts. Let's say you wanted to have an alert which checks to see if the foo-process is running. This process is a part of the Foo service which is installed on your master hosts only. You'd need to create a new "Service" in Ambari using the stack extension mechanism and then install the Foo Process components on the right hosts using Ambari. Some administrators go with option #2 because they want Ambari to manage the non-Hadoop service. However, it's a bit more complicated. You can use option #1 and simply have your alert check for the existance of the process on the path. If it exists, you know to check for it running. If it doesn't exist, then you can return the SKIPPED alert state.
... View more
06-29-2017
06:33 PM
So Ambari says that the DN is stopped but the alert is OK and the process is running. That sounds like it's a problem with the process ID check during the status commands. Does this file exist: /var/run/hadoop/hadoop-hdfs-datanode.pid That would contain the PID of the DN. This may be customized in your environment, but chances are it's not. - Stop the DN in Ambari - Remove this file by hand - Check for the DN to be stopped using ps - Start up the DN in Ambari
... View more