Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Using Python to delete Hue/Cloudera users

avatar
Rising Star

Hi,

 

My background:

I am new to Hadoop/Hue/Cloudera and am in the process of digging through the source of the page where users are displayed.  I do have web development experience and have developed web based solutions for many years.  I have tried to use the "requests" module in Python and although I can get a CSRF token and load pages, I'm not getting everything as it is not loading the .js and .css files because they are cached.  So, some things aren't loading like the user list.

 

I have been researching for a few days and come to the conclusion that it's not going to be an easy task to figure out how to delete users using Python.

We get a feed from HR with a list of users to remove from Hue and Cloudera.  We need to automate the process of removing users from the 2 apps.

 

It appears the process of deletion on the page is done purely with Javascript.  I haven't resorted to tcpdump to see what call is being made but am close.

 

The options I see are:

  • Use tcpdump and figure out the call made and use the Python "requests" module to post the same
  • Use an API that, as far as I can tell, does not exist
  • Use Django models and functions to remove the users.

Does anyone have any experience and would not mind sharing how they did it or how they think it can be done?

 

Thanks!

 

3 ACCEPTED SOLUTIONS

avatar
Master Guru

@pollard,

 

You cannot use the CM API to delete Hue users.

I am not certain if the cm_api is compatible with Python 3.

If Python is not working for you, you can try using the REST api:

 

https://cloudera.github.io/cm_api/apidocs/v5.15.0/path__users_-userName-.html

 

Here is an example of how to delete a CM user named "deleteme"

 

# curl -u admin:admin -H "Content-Type: application/json" -X DELETE http://cm_host.example.com:7180/api/v19/users/deleteme

 

There is no out-of-the-box way to delete users.  You could probably figure out how to use curl, but here is one way that I know of that could be scripted:

 

# JAVA_HOME=/usr/java/jdk1.8.0_152 HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/`ls -lrt /var/run/cloudera-scm-agent/process/ | awk '{print $9}' |grep HUE_SERVER| tail -1` HADOOP_CREDSTORE_PASSWORD=`grep environment $HUE_CONF_DIR/supervisor.conf |sed "s/.*HADOOP_CREDSTORE_PASSWORD='\([^']*\).*/\1/"` /opt/cloudera/parcels/CDH/lib/hue/build/env/bin/hue shell << EOF
> from django.contrib.auth.models import User
> user = User.objects.get(username="wgnmaxgcuj")
> user.delete()
> EOF

 

In the above, use your Java location and also pass in or use the username of the user you want to delete i place of "wgnmaxgcuj".

The above also assumes that you are using Cloudera Manager and parcels.

 

View solution in original post

avatar
Master Guru

@pollard,

 

Sorry, don't have enough time to dig too deep on this.

 

I did use tcpdump and Wireshark to see this is how the ids are sent:

 

csrfmiddlewaretoken=NlsLy4K7LBHIiSHsalA8isMiQtIDk4FQ&is_embeddable=true&user_ids=17&user_ids=18&user_ids=12

 

Passwords cannot be downloaded from CM.

The CM API only allows deletion of a CM user (stored in CM's database).  Hue Users are stored in the Hue database.

The curl example is for CM only.

 

The other example (using Hue command line) is for Hue.

 

Use TLS for your CM API connections so the plain text password submitted is encrypted over the wire.

 

Same when accessing Hue UI... TLS.  There is no way to encrypt the password before sending at this time.

View solution in original post

avatar
Rising Star

@bgooley

Got it!

I'm relatively new to Python development and was trying figure out what was an acceptable form of multiple criteria for a single POST request for the same key.  <== Maybe that sentence will help someone find this later on.

I used tcpdump as well but could not see the POST data as it was encrypted with SSL.  I was not able to turn off SSL to see plain text content as I don't have access to the server.  I was exploring using a cert to decrypt but have never done that using tcpdump.  I would love to know how, as I have run into trying to debug traffic and could not see the content.

Anyway, I was having trouble creating a dict for the form data as it would only accept a single key.  I didn't realize that an array(list) with a single dict key would work as well.

This line was a problem in that Python would not let me add multiple key entries like "user_ids='4', user_ids='5'". I didn't realize all it needed was a list.

 

form_data = dict(csrfmiddlewaretoken=session.cookies['csrftoken'] ,next='/hue/useradmin/users/', user_ids=['4','5'])

 I tried every combination I could think of but made the mistake of surrounding the arguments in quotes, making them a single text argument versus a list.

What you did for me was verify a few things and put me on the right trail.

 

Thanks to everyone for their contribution!

View solution in original post

9 REPLIES 9

avatar
Rising Star

avatar
Rising Star

Thank you sbpothineni for your response.

 

I have been digging pretty hard and found out that the "cm_api" Python module doesn't appear to be updated for 3.x versions of Python.  Being that I have to have Python 3.x for Django libraries, anyone know if Cloudera will update their Python API anytime soon to be compatible with Python 3.x?

I have an error in one of their modules that essentially is using an older version of urllib and the syntax for exceptions is 2.7 syntax.

My other question is can you use the cm_api Python module to manage users in Hue or is that separate?

 

Thanks!

avatar
Master Guru

@pollard,

 

You cannot use the CM API to delete Hue users.

I am not certain if the cm_api is compatible with Python 3.

If Python is not working for you, you can try using the REST api:

 

https://cloudera.github.io/cm_api/apidocs/v5.15.0/path__users_-userName-.html

 

Here is an example of how to delete a CM user named "deleteme"

 

# curl -u admin:admin -H "Content-Type: application/json" -X DELETE http://cm_host.example.com:7180/api/v19/users/deleteme

 

There is no out-of-the-box way to delete users.  You could probably figure out how to use curl, but here is one way that I know of that could be scripted:

 

# JAVA_HOME=/usr/java/jdk1.8.0_152 HUE_CONF_DIR=/var/run/cloudera-scm-agent/process/`ls -lrt /var/run/cloudera-scm-agent/process/ | awk '{print $9}' |grep HUE_SERVER| tail -1` HADOOP_CREDSTORE_PASSWORD=`grep environment $HUE_CONF_DIR/supervisor.conf |sed "s/.*HADOOP_CREDSTORE_PASSWORD='\([^']*\).*/\1/"` /opt/cloudera/parcels/CDH/lib/hue/build/env/bin/hue shell << EOF
> from django.contrib.auth.models import User
> user = User.objects.get(username="wgnmaxgcuj")
> user.delete()
> EOF

 

In the above, use your Java location and also pass in or use the username of the user you want to delete i place of "wgnmaxgcuj".

The above also assumes that you are using Cloudera Manager and parcels.

 

avatar
Rising Star

That sounds right.  I didn't think there would be an easy solution.  And, no, "cm_api" has not been updated from the Python 2.x days.  I have figured out how to delete a Hue user by creating an HTML POST request to simulate the web interface doing the deletion.  The one thing I haven't figured out yet is how to do multiples in one request.  If I have a list of users, I assume I need to format the request somehow like 'user_ids="1,2,37,45"' but can't figure out what it needs as there is no documentation that I can find.

The other problem I have to resolve is how to get the internal ID for a given user.  The only way I know to do it is load a DOM object and find what I'm looking for by traversing the DOM.  Sort of like a screen scraping...  🙂

At least I know I can load the Users list page and parse it if need be.

 

This actually helps in that it gives me some indicator of the functionality for managing users through Django as well as an actual API.  The API looks simple.  The problem with using the model is, I will have to get access to the servers that Cloudera is running on.  I might could arrange that but there's a bit of bureaucracy involved.  

 

In the meatime, it would be most helpful if I could find out how the Javascript in the Hue interface stores multiple user IDs when you click on them and what format they are in when the server receives the form with the "user_ids" field in it with a list of users to delete???

 

This code actually works:


cookies = session.cookies
headers = session.headers
form_data = dict(user_ids="4", csrfmiddlewaretoken=session.cookies['csrftoken'] ,next='/hue/useradmin/users/')
r=session.post('http://<Hue Server:Port>/useradmin/users/delete', data=form_data, cookies=session.cookies, headers=dict(Referer=login_url))

For a single user.  I could loop through each one but would prefer to be more efficient.  And, I learn more about it at the same time.

My apologies for not getting back before now as I never got an email or missed the email notification that it had been updated.  

 

Thanks!

avatar
Master Guru

@pollard,

 

I think it is a JavaScript list passed in as parameter to the post.

 

You may be able to use Wireshark to view how the form is submitted on the web page.

Internally, we use:

 

ids = request.POST.getlist('user_ids')

 

avatar
Rising Star

@bgooley

I believe you are correct.  I have been trying to debug the Javascript code without much success.  I believe the array simply holds the selected users or those that have a checkmark beside them.  And each time a user is selected the array that holds the selected ones is erased and rebuilt  I assume this array is serialized somehow as the Javascript array object would not be compatible with a form element that would hold them.  It is this serialization and the final form of what the POST request function "getlist" actually expects.

The method you are using is also used in Django in the Hue application.  Unforturnately, I cannot get to the code and inject any debug messages to see what it looks like.

Maybe you can tell me what form the "user_ids" is in for you and that might give me a hint.

I would assume you can debug the code and see what the POST element contains???

 

Thanks!!!

avatar
Rising Star

@bgooley

You have supplied a partial solution as I took your curl example and converted it to Python.  Now, if Hue had this same API functionality, it would be awesome!

I assume the code in your example up to the beginning of the heredoc syntax will extract a password?  If so, for what user?  Is this for Hue?  Is there one for Cloudera?

I need a secure way to provide a password without using clear text in Python code for both applications.

 

avatar
Master Guru

@pollard,

 

Sorry, don't have enough time to dig too deep on this.

 

I did use tcpdump and Wireshark to see this is how the ids are sent:

 

csrfmiddlewaretoken=NlsLy4K7LBHIiSHsalA8isMiQtIDk4FQ&is_embeddable=true&user_ids=17&user_ids=18&user_ids=12

 

Passwords cannot be downloaded from CM.

The CM API only allows deletion of a CM user (stored in CM's database).  Hue Users are stored in the Hue database.

The curl example is for CM only.

 

The other example (using Hue command line) is for Hue.

 

Use TLS for your CM API connections so the plain text password submitted is encrypted over the wire.

 

Same when accessing Hue UI... TLS.  There is no way to encrypt the password before sending at this time.

avatar
Rising Star

@bgooley

Got it!

I'm relatively new to Python development and was trying figure out what was an acceptable form of multiple criteria for a single POST request for the same key.  <== Maybe that sentence will help someone find this later on.

I used tcpdump as well but could not see the POST data as it was encrypted with SSL.  I was not able to turn off SSL to see plain text content as I don't have access to the server.  I was exploring using a cert to decrypt but have never done that using tcpdump.  I would love to know how, as I have run into trying to debug traffic and could not see the content.

Anyway, I was having trouble creating a dict for the form data as it would only accept a single key.  I didn't realize that an array(list) with a single dict key would work as well.

This line was a problem in that Python would not let me add multiple key entries like "user_ids='4', user_ids='5'". I didn't realize all it needed was a list.

 

form_data = dict(csrfmiddlewaretoken=session.cookies['csrftoken'] ,next='/hue/useradmin/users/', user_ids=['4','5'])

 I tried every combination I could think of but made the mistake of surrounding the arguments in quotes, making them a single text argument versus a list.

What you did for me was verify a few things and put me on the right trail.

 

Thanks to everyone for their contribution!