Created 06-20-2017 07:47 AM
Hi,
I'm struggling with ambari metrics since a couple of days and cannot figure out how to investigate further :
Basically, I have a secured (kerberized) HDP 2.5 cluster and I would like to post custom metrics in ambari metrics. It's worth saying that timeline.metrics.service.operation.mode property has "embedded" value, which means (if I understood properly) that ams has an embedded hbase instance with its own zookeeper.
Let's name the server running ambari-metrics collector : server1.mydomain.com
I gave a try with following requests :
curl -H "Content-Type: application/json" -X POST -d '{"metrics": [{"metricname": "AMBARI_METRICS.SmokeTest.FakeMetric", "appid": "amssmoketestfake", "hostname": "server1.mydomain.com", "timestamp": 1432075898000, "starttime": 1432075898000, "metrics": {"1432075898000": 0.963781711428, "1432075899000": 1432075898000}}]}' "http://server1.mydomain.com:6188/ws/v1/timeline/metrics"
=> Returned HTTP 200 code, with following json data : {"errors":[]}
Then, I tried to retrieve this dummy metrics with following request :
curl -H "Content-Type: application/json" -X GET "http://server1.mydomain.com:6188/ws/v1/timeline/metrics?metricNames=AMBARI_METRICS.SmokeTest.FakeMetric&appId=amssmoketestfake&hostname=server1.mydomain.com"
=> Returned HTTP 200 code, with following json data : {"metrics":[]}
Trying to figure out why my metrics dont come up in this GET request, but I'm facing security concerns :
First, I want to connect to phoenix/hbase to check if metrics were properly stored.
I checked following properties in /etc/ambari-metrics-collector/cinf/hbase.site file :
- property hbase.zookeeper.property.clientPort : 61181
- zookeeper.znode.parent : /ams-hbase-secure
So I gave a try to following command :
/usr/hdp/current/phoenix-client/bin/sqlline.py server1.mydomain.com:61181:/ams-hbase-secure
I receive following warning every 15 seconds, and connection never succeeds :
17/06/20 08:46:04 WARN ipc.AbstractRpcClient: Couldn't setup connection for myuser@mydomain.com to hbase/server1.mydomain.com@mydomain.com
=> Should I use a particular user to execute this command (ams, hbase, ... ?). Is it simply possible to connect to embedded phoenix instance like this ?
I also tried out to connect to the embedded hbase's zookeeper instance with following command :
zookeeper-client -server server1.mydomain.com:61181
But couldn't connect and received following errors :
2017-06-20 09:30:24,306 - ERROR [main-SendThread(server1.mydomain.com:61181):ZooKeeperSaslClient@388] - An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
2017-06-20 09:30:24,306 - ERROR [main-SendThread(server1.mydomain.com:61181):ClientCnxn$SendThread@1059] - SASL authentication with Zookeeper Quorum member failed: javax.security.sasl.SaslException: An error: (java.security.PrivilegedActionException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Server not found in Kerberos database (7))]) occurred when evaluating Zookeeper Quorum Member's received SASL token. Zookeeper Client will go to AUTH_FAILED state.
What's wrong with this ? I performed a kinit operation beforehand, but it seems that my ticket is not granted sufficient permissions...Should I try to connect with specific user in order to read zookeeper content ?
Thanks for your help
Created 06-20-2017 06:26 PM
A couple of pointers.
A reason why your metrics might have been discarded is that by default, AMS discards data that it is more than 5 mins in the past. You can check the value of "starttime" in your request if it is 5mins past the time of making that request. This 5min value can be changed through adding a custom ams-site config 'timeline.metrics.service.outofband.time.allowance.millis' which is the discard time boundary in milliseconds.
Also, can you check if the metric was at least tracked by the AMS metadata through - http://server1.mydomain.com:6188/ws/v1/timeline/metrics/metadata URL. Search for your custom metric name or appId.
To connect to AMS phoenix, you can first ssh onto the collector host.
You should be looking at the METRIC_RECORD table.
Created 06-20-2017 09:48 AM
I finally figured out how to connect to ambari metrics tables with phoenix : by default, sqlline.py points to "main" hbase configuration, not the ams embedded instance.
By defining the HBASE_CONF_DIR env variable, got it working :
export HBASE_CONF_DIR=/etc/ambari-metrics-collector/conf
/usr/hdp/current/phoenix-client/bin/sqlline.py fr0-datalab-p09.bdata.corp:61181:/ams-hbase-secure
I guess there is something similar when trying to connect to zookeeper, to point out to embedded instance instead of "main" zookeeper of the cluster, but couldn't solve this at the moment...
Created 06-20-2017 06:26 PM
A couple of pointers.
A reason why your metrics might have been discarded is that by default, AMS discards data that it is more than 5 mins in the past. You can check the value of "starttime" in your request if it is 5mins past the time of making that request. This 5min value can be changed through adding a custom ams-site config 'timeline.metrics.service.outofband.time.allowance.millis' which is the discard time boundary in milliseconds.
Also, can you check if the metric was at least tracked by the AMS metadata through - http://server1.mydomain.com:6188/ws/v1/timeline/metrics/metadata URL. Search for your custom metric name or appId.
To connect to AMS phoenix, you can first ssh onto the collector host.
You should be looking at the METRIC_RECORD table.
Created 06-21-2017 12:01 PM
Thanks a lot for these helpful pointers. Actually I was not aware about this time boundary, and as soon as I changed timestamps in my json data, it started to work fine !
Created 07-02-2018 04:07 PM
Hello,
I'm facing a similar issue with metrics not storing in HBase METRIC_RECORD table. I can however see the metrics being tracked since they are being returned by the /ws/v1/timeline/metrics/metadata endpoint.
I have set: timeline.metrics.service.outofband.time.allowance.millis=600000
Just to give a background, I'm using StormTimelineMetricsSink to push custom topology metrics to the metrics collector & see the log statements to show that metrics are being emitted properly. Although the metric names show up in the dropdown of Grafana, there are no values to plot. I do see other other metric values showing up in the default graphs for AMS_HBASE, HOST etc.
I did rm -rf hbase-tmp/ folders couple of times & started clean. I also verified there is plenty of space on the disk.
Could you please help with identifying the missing connection to see the custom data pushed to the embedded HBASE?