About jonathanhurley

jonathanhurley · ‎01-24-2018

It seems like your cluster was not in a consistent state. Ambari needs to know what repository/version your cluster is "CURRENT" on. We track this via the cluster_version table. There will be a bunch of entries in here, one for each repository you've ever installed. It's OK for most of them to be OUT_OF_SYNC - that's expected. However, exactly 1 must be "CURRENT". I would find your correct repository version from the repo_version table and then update the corresponding row in the cluster_version table. Something like this: UPDATE cluster_version SET state = 'CURRENT' WHERE repo_version_id = <some-id>;

jonathanhurley · ‎12-07-2017

Hi Mudassar, Generally, it's better to open a new issue instead of tacking onto an existing one since the problem/resolution could be very different. To answer your question, no, you can't clear it in this case. This is a metric alert coming from HDFS. The HDFS service is broadcasting that 1 DataNode is considered dead. Ambari is simply detecting this and alerting on it. You'll need to figure out why the NameNode is sending that metric. Normally I think the NN considers a DataNode "dead" after more than a few minutes of lost contact (without a decommission). However, if the DataNode makes contact again, it should be clearing it.

jonathanhurley · ‎10-11-2017

You can clear out an individual alert's state by disabling it and then re-enabling it. This will cause all active instances of that alert to disappear and it will run clean. On some versions of Ambari, this was required when you did things like delete a host which could leave orphaned alerts which never run again (and thus become stale). If you are seeing the actual "Stale Alert" trigger, you'll want to identify which alerts are causing it to fire - in other words, which alerts are not running. Disabling/Enabling those could help - but if they seem to continue to be stale, then something else is going on which is preventing them from running.

jonathanhurley · ‎09-11-2017

There are several alerts which cover heap. For example, if you wanted to see the raw DataNode heap value, then you'd check the metric alert definition for it: https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json#L1714-L1755 For NameNode, we don't alert on heap directly but we measure average heap deviation values: https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json#L868-L947 https://github.com/apache/ambari/blob/trunk/ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/package/alerts/alert_metrics_deviation.py

jonathanhurley · ‎09-06-2017

@Dinesh Chitlangia Glad you got it resolved. By any chance, did you determine which alert definition it was and what the invalid JSON was? We might want to file a bug if we can figure out how it happened.

jonathanhurley · ‎09-06-2017

The JSON of the alert definition must have an invalid property set on it. Not sure how it happened, but there are a couple of options: Find the offending alert definition and correct its JSON in the database. Chances are it's the one after yarn_app_timeline_server_webui. Just remove all alert data and have Ambari bootstrap the definitions back in from the stack. This is a good option if you haven't added or modified any alert definitions from the ones that ship with the product.

jonathanhurley · ‎07-17-2017

Those tables have referential integrity and should not be able to get out-of-sync. Are you using MySQL as your Ambari database? If so, is your default engine InnoDB or MyISAM? If it's MyISAM, that's very bad as it doesn't support transactions or foreign keys. You would need perform several steps to convert your database into InnoDB or this could happen again in the future.

jonathanhurley · ‎05-25-2017

You'll need to re-generate certificates on the Ambari Server since they are expired: https://community.hortonworks.com/articles/68799/steps-to-fix-ambari-server-agent-expired-certs.html

jonathanhurley · ‎05-25-2017

Heartbeats can be lost if an exception occurs while Ambari Server is handling the heartbeat. It can also happen if there is an SSL certificate issue between server and agent. Can you please attach the ambari-server log and a log from the ambari-agent?

jonathanhurley · ‎04-24-2017

You're hitting an issue with Ambari Server Upgrade from 2.4.2 to 2.5.0.3 - as part of this upgrade, we need to drop and re-create the primary key on the hostcomponentdesiredstate table. The error you're getting indicates that the primary key already exists and thus can't be added again. In your logs, you might see something like this statement: Unable to determine the primary key constraint name for hostcomponentdesiredstate I'd like to know why this might be happening (could be an artifact of how your Oracle DB is installed). In any event, you should be able to correct this by hand and re-run the upgrade: ALTER TABLE hostcomponentdesiredstate DROP CONSTRAINT PK_hostcomponentdesiredstate; ALTER TABLE hostcomponentdesiredstate ADD CONSTRAINT PK_hostcomponentdesiredstate PRIMARY KEY (id); Now you can retry "ambari-server upgrade"

Online	Offline
Last Visited	‎02-13-2024 11:04 AM

Member Since	‎10-14-2015 01:53 PM
Last Visited	‎02-13-2024 11:04 AM
Posts	165
Kudos received	63

Cloudera Community

Re: Need the email related fields for JSON object ...

Re: How to interpret alerts in Ambari UI

Re: Ambari not loading changes to alerts.json

Re: upgrade ambari-server 2.5.2 to 2.6.1

Re: How to purge/reset Ambari alerts

Re: upgrade ambari-server 2.5.2 to 2.6.1

Re: How to purge/reset Ambari alerts

Re: How to purge/reset Ambari alerts

Re: Ambari Alerts - Namenode heap source code

Re: Ambari Upgrade Failure : AlertDefinitionFactor...

Re: Ambari Upgrade Failure : AlertDefinitionFactor...

Re: Unable to run the ambari_server_stale_alerts a...

Re: AMbari heart beat lost

Re: AMbari heart beat lost

Re: ambari-server upgrade Issues from Ambari 2.4....