Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Add file to base CDSW image build.

avatar
Explorer

CDSW version 1.7

I need to add a file to every image that cdsw builds.

jupter_notebook_config.py will need to be placed in the in the .jupyter directory of every project.

This is because our organization needs the cdsw sessions to be culled after inactivity and jupyter notebooks prevents the built in idle timeout from functioning.

 

1 ACCEPTED SOLUTION

avatar
Explorer
What our organization did is we have base project that we created that we get everyone the pull from that has everything set up. It is really not ideal but it was the best we could come up with given our control over the product. You are correct that the .jupyter dir does not exist until jupyter-notebooks is run and we could not stick it in any directory to get copied on the docker build. We actually had upgraded over the weekend for other reasons and i tested out the auto kill for jupyter.


If you are a CLOUDERA rep or a rep reads this please understand that the main reason our organization chose CDSW is because of this functionality and the ability to edit the docker build. We have been through Anaconda Enterprise, IBM Watson studio, and even tried to run JupyterHub. Our organization has 150+ data scientists/ analysts and they are all irresponsible when it comes to stopping their sessions. Anaconda was by far the worst performing product/company to work with for support.


Now with CDSW we can propagate project configuration/ tutorial scripts/ spark and hive config from the top down to all projects via the docker build and we love it.


Below are the contents of jupyter-notebook-config.py located in .juypter, it worked for us.

,

c.NotebookApp.shutdown_no_activity_timeout = 3600
c.MappingKernelManager.cull_idle_timeout = 2600




View solution in original post

2 REPLIES 2

avatar
Contributor

This is a pretty interesting question, at first I was going to suggest just using a copy command in your Dockerfile to copy this file over, however, I'm not totally positive that the .jupyter directory exists until you start up a CDSW session with .jupyter notebook. 

 

Can you show me what you are adding in the config file to time out the Jupyter notebooks?

You are correct that Jupyter notebooks do not time out from the IDLE_MAXIMUM_MINUTES environment variable. R Studio sessions do not either and this has been a long running and difficult issue since Cloudera doesn't write or control this code.  It looks like a lot of this is fixed in CDSW 1.9 though.


If you just want to time out. jupyter notebooks, you could try to edit the Jupyter Notebook command and add this:

NOTEBOOK_TIMEOUT_SECONDS=$(python3 -c "print(${IDLE_MAXIMUM_MINUTES}*60)") /usr/local/bin/jupyter notebook --no-browser --ip=127.0.0.1 --port=${CDSW_APP_PORT} --NotebookApp.token= --NotebookApp.allow_remote_access=True --NotebookApp.quit_button=False --log-level=ERROR --NotebookApp.shutdown_no_activity_timeout=300 --MappingKernelManager.cull_idle_timeout=${NOTEBOOK_TIMEOUT_SECONDS} -- TerminalManager.cull_inactive_timeout=${NOTEBOOK_TIMEOUT_SECONDS} --MappingKernelManager.cull_interval=60 --TerminalManager.cull_interval=60 --MappingKernelManager.cull_connected=True 

 This will kill Jupyter Notebooks that have been longer than IDLE_MAXIMUM_MINUTES of inactivity (default to 60 minutes.)

 

There are a few caveats to this, the main one being that this still wont kill Jupyter Terminals due to the version of Jupyter Notebooks that CDSW 1.7 uses. Also, users will not get a warning; their Notebook and corresponding CDSW session will just get killed.

 

You can try this and let me know if it works. I'm also curious about your config file that you want to add into .jupyter. 

avatar
Explorer
What our organization did is we have base project that we created that we get everyone the pull from that has everything set up. It is really not ideal but it was the best we could come up with given our control over the product. You are correct that the .jupyter dir does not exist until jupyter-notebooks is run and we could not stick it in any directory to get copied on the docker build. We actually had upgraded over the weekend for other reasons and i tested out the auto kill for jupyter.


If you are a CLOUDERA rep or a rep reads this please understand that the main reason our organization chose CDSW is because of this functionality and the ability to edit the docker build. We have been through Anaconda Enterprise, IBM Watson studio, and even tried to run JupyterHub. Our organization has 150+ data scientists/ analysts and they are all irresponsible when it comes to stopping their sessions. Anaconda was by far the worst performing product/company to work with for support.


Now with CDSW we can propagate project configuration/ tutorial scripts/ spark and hive config from the top down to all projects via the docker build and we love it.


Below are the contents of jupyter-notebook-config.py located in .juypter, it worked for us.

,

c.NotebookApp.shutdown_no_activity_timeout = 3600
c.MappingKernelManager.cull_idle_timeout = 2600