Created on 03-02-2016 02:30 PM - edited 09-16-2022 03:06 AM
We run multiple clusters from multiple teams on the same CM instance, so the email blast for any error for all clusters isn't very useful to us.
I can pull back from the api any node that is having a problem using:
http://bdatadevcm01.northamerica.cerner.
curl --silent -u <user>:<pass> http://<CM>:7180/api/v10/clusters/<cluster>/services/<cluster>-hbase/roles | grep 'hostId\|healthSummary' | grep -B 1 CONCERNING | grep hostId | awk -F ':' '{print $2}' | awk -F '"' '{print $2}')
I'm aware it's not pretty, but that's not currently the point.
This will tell me any node in the cluster that's in a concerning status. I'd like to be able to pull back from the API the reason it's in a concerning status.
I played around a bit with timeseries, and was able to pull back that the node has several events with the url:
http://<cm>:7180/api/v10/timeseries?query=select+alerts+where+hostId%<hostid>&chartType=table
Again though, this tell me the node has several alerts, but not what the alerts are.
If I have either the hostid or hostname, how can I pull back why the node is in a concerning status in CM using only the API?
Created 03-03-2016 12:47 PM
>... query by id filter, ...
For clarification, when you say "=id==" are you referring to "hostId", or the unique "id" of the event.
Reason is that in your supplied URL "...query=id==<hostid>", the "id" is a unique ID for an event [1]; however, <hostid> (HOST_IDS) key name within the "attributes" listing [1]. I think your filter is expected to return 0, unless you have valid the "id" for the event.
Does it pull info if you change your query to filter the attributes.HOST_IDS?
ie: http://cm:7180/api/v10/events?query=attributes.HOST_IDS==%22<hostid>%22
Let me know if this helps.
[1] https://cloudera.github.io/cm_api/apidocs/v11/ns0_apiEvent.html
Created on 03-02-2016 03:42 PM - edited 03-02-2016 03:52 PM
> If I have either the hostid or hostname, how can I pull back why the node is in a concerning status in CM using only the API?
[1a] http://cloudera-server:7180/cmf/events
[1b] http://cloudera-server:7180/api/v10/events
[2] https://cloudera.github.io/cm_api/apidocs/v11/path__events.html
Created 03-03-2016 06:48 AM
So I found where I can use the events in the api and add the query by id filter, so it looks like:
http://cm:7180/api/v10/events?query=id==<hostid>
I used a random hostid from the events output to verify the query worked.
When I use the query to find using the hostid of the nodes in concerning status though, I get no results:
"totalResults" : 0,
Since we're doing this for testing, I know that the node is in concerning status due to Log Directory Free Space. But I'm not seeing that on the events page.
I recreated a similar case where I created a large empty file on a random region server to get it into a concerning status due to lack of free space. Querying the events page with the host id, it pulls back no events for the host id, even though it was a very new change. I waited about 4 minutes after and checked again, but still wasn't able to pull back the reason.
I then just went to the events page and looked for both hostid and hostname, but couldn't find any events specifically related to lack of free space.
Created 03-03-2016 12:47 PM
>... query by id filter, ...
For clarification, when you say "=id==" are you referring to "hostId", or the unique "id" of the event.
Reason is that in your supplied URL "...query=id==<hostid>", the "id" is a unique ID for an event [1]; however, <hostid> (HOST_IDS) key name within the "attributes" listing [1]. I think your filter is expected to return 0, unless you have valid the "id" for the event.
Does it pull info if you change your query to filter the attributes.HOST_IDS?
ie: http://cm:7180/api/v10/events?query=attributes.HOST_IDS==%22<hostid>%22
Let me know if this helps.
[1] https://cloudera.github.io/cm_api/apidocs/v11/ns0_apiEvent.html
Created 03-03-2016 12:52 PM
Yep, that did it. Didn't realize the id's were event id's and not host id's.
Used the attributes.HOST_IDS and was able to pull back the information for the host.
With this output, I can sort and build alerting off it.
Thank you.