Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

CM api timeseries module not returning metrics queried

CM api timeseries module not returning metrics queried

New Contributor

Hi Guys,

 

      I am using the cm_api "timeseries" module to query for HDFS data between 7/05 - 7/28 and the data returned is from 7/19-7/28, but when I go to the CDH UI I can see HDFS data as far back as 7/15. Is there a reason CM would be witholding data in this fashion and is there some sort of max threshold for querying?

 

Any help on this matter would be much appreciated!

 

Thanks,

 

- Ryan

1 REPLY 1
Highlighted

Re: CM api timeseries module not returning metrics queried

New Contributor

Btw, Here is the code I am using to grab the metrics:

 

import re
import subprocess
import json
import requests
import numpy as np
import pandas as pd
import time
import sys

from datetime import datetime
from report_tools import to_epoch, easy_time
from cm_api.api_client import ApiResource
from cm_api.endpoints import timeseries

def process_CDH_result(result):
    print result
    ts_list = result[0]
    node_list = []
    for ts in ts_list.timeSeries:
        nodename = ts.metadata.entityName
        name_strings = ['HDFS','hdfs']
        if any(x in nodename for x in name_strings):
                timestamps,values = [],[]
                for point in ts.data:
                        timestamps.append(point.timestamp)
                        values.append(point.value)
                df = pd.DataFrame({'time':timestamps, 'value':values})[['time','value']]
                node_list.append({'Node_name':ts.metadata.entityName, 'Metric_name': ts_list.timeSeries[0].metadata.metricName, 'Data':df})
    return node_list

def get_CDH_metrics(hostname,creds,(start,end)):
    user,pw = creds
    api = ApiResource(hostname,'7180',user,pw,version=16)
    metrics = ["dfs_capacity_used","dfs_capacity"]
    metric_dict = {}
    for metric in metrics:
        result = timeseries.query_timeseries(api,query="select " + metric, from_time=datetime.fromtimestamp(start), to_time=datetime.fromtimestamp(end),desired_rollup='HOURLY',must_use_desired_rollup=True)

        df = process_CDH_result(result)[0]
        new_df = df['Data'].set_index('time').rename(index=str,columns={'value':df['Node_name']})
        print new_df
        metric_dict[metric_name_maps['CDH'][metric]] = new_df
    return metric_dict 

metric_dict = get_CDH_metrics(localhost,('admin','admin'),(1525244400,1534461757))
print metric_dict