- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Programmatically tracking MR Job status using the Cloudera Manager API (and Python libs)?
Created on ‎06-30-2014 03:13 PM - edited ‎09-16-2022 02:01 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Cloudera community:
tl;dr: Is there a more straighforward way to query the Cloudera Manager API to get information (mapper/reducer completion, bytes processed, etc) about jobs, perhaps by simply providing a jobId or a jobname?
I am using a Python script to check on the status of various MapReduce jobs, using the Cloudera Manager API, roughly something like this:
from cm_api.api_client import ApiResource api = ApiResource('zzzz', version=1, username='zzz', password='zzz') for s in api.get_cluster('my cluster').get_all_services(): if s.name == 'MR': # my activities are in s.get_running_activities()
I then retrieve the job ids from the MR activities, and use the Hadoop command 'mapred job -status' to asertain information about them.
I am using CDH 4, and I am not currently using Yarn.
Thanks!
Created ‎07-20-2014 08:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎07-20-2014 08:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎07-21-2014 01:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I initially found this confusing, because the Python library for the Cloudera Manager API lacks helper functions for this API endpoint. Nonetheless, it is easy to implement the API call in Python. I will look into adding a helper class to the open-source Python library.
HOST = 'myhost' CLUSTER_NAME = 'mycluster' SERVICE = 'mapreduce1' ACTIVITY_ID = 'your_activity_job_id' parameters = 'clusters/%s/services/%s/activities/%s/metrics' % ( CLUSTER_NAME, SERVICE, ACTIVITY_ID) url = '%s:7180/api/v1/%s' % (HOST, urllib.quote(parameters)) r = requests.get(url,auth=(USERNAME, PASSWORD)) print r.json()
