Created 07-06-2020 07:48 AM
Hi everyone
The CDSW base image v10 ships with R 3.5.1 (as documented in the CDSW Docs -> Pre-Installed Packages).
As this is an older R version, a lot of R developers would like to upgrade to a more recent version: either 3.6.3 or 4.0.x (most recent version is 4.0.2 which was just released).
Was anyone able to do that successfully, so that the updated R version can be used both from the Workbench - R editor and RStudio as well? What were the upgrade steps involved?
Thanks!
Created 07-06-2020 11:34 AM
@mattematics I can imagine a way where you can clone the base image and then install/upgrade the desired packages. You can start form below blog and doc for Reference.
https://blog.cloudera.com/customizing-docker-images-in-cloudera-data-science-workbench/
Created 07-07-2020 03:02 AM
Hi @GangWar
This is what I was aiming at.
My current Dockerfile roughly looks as follows:
#Dockerfile
FROM docker.repository.cloudera.com/cdsw/engine:10
WORKDIR /tmp
ENV R_HOME=/usr/local/lib/R
RUN wget http://cran.rstudio.com/src/base/R-3/R-3.6.3.tar.gz && \
tar xvf R-3.6.3.tar.gz && \
cd R-3.6.3 && \
./configure --prefix=/usr/local --enable-R-shlib && \
make && \
make install && \
rm -rf /usr/local/bin/R && \
rm -rf /usr/local/bin/Rscript && \
ln -s /usr/local/lib/R/bin/R /usr/local/bin/R && \
ln -s /usr/local/lib/R/bin/Rscript /usr/local/bin/Rscript && \
echo -e "# make libR.so visible to ld.so\n/usr/local/lib/R/lib" > /etc/ld.so.conf.d/libR.conf && \
ldconfig && \
cd .. && \
rm -rf R-3.6.3.tar.gz && \
rm -rf R-3.6.3
# the java installation is mounted at CDSW session run time - copy it to the build context here
COPY ./java /usr/lib/jvm/java-openjdk
RUN export JAVA_HOME=/usr/lib/jvm/java-openjdk && \
R CMD javareconf
RUN Rscript -e "update.packages(checkBuilt=TRUE, ask=FALSE, repos='https://cloud.r-project.org')"
# remove java installation again since it is mounted at runtime
RUN rm -rf /usr/lib/jvm
I am successfully able to launch CDSW sessions with this container with the following editors:
The issue remains though, that launching a session with the Workbench - R editor is not possible. The session is immediately exited and the following error visible in the log:
...
PID of main R process is 205
PID of parser R process is 207
R has exited with code 2 and signal null
Exiting with code 2
...
Using the base image which works with the Workbench - R editor, I checked which processes CDSW tries to launch in this case. These are the following two:
cdsw 197 51 0 08:29 ? 00:00:00 /usr/local/lib/R/bin/exec/R --sense --no-readline --args
cdsw 198 51 0 08:29 ? 00:00:00 /usr/local/lib/R/bin/Rserve --RS-socket /tmp/cdsw-rserve-x0jbdet9dfe1428o.sock --RS-source /usr/local/lib/node_modules/r-engine/lib/parse.utils.r --slave
Using my own image and a Workbench - Python session, I was able to successfully execute these two processes.
Process 1:
bash$ /usr/local/lib/R/bin/exec/R --sense --no-readline --args
WARNING: unknown option '--sense'
R version 3.6.3 (2020-02-29) -- "Holding the Windsock"
Copyright (C) 2020 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>
Process 2:
bash$ /usr/local/lib/R/bin/Rserve --RS-socket /tmp/cdsw-rserve-x0jbdet9dfe1428o.sock --RS-source /usr/local/lib/node_modules/r-engine/lib/parse.utils.r --slave
Rserve started in non-daemon mode.
To conclude: There seems to be something behind the scenes which CDSW does when launching the Workbench - R editor, which causes it to crash, but I cannot get to the bottom of what the issue is.
Any ideas?