About CCP Data Scientist Exam - Tools

New Contributor
Hello , I need advice on the tools for the CCP Data Scientist Certification. here are my questions based on the website description below, - although, R is stated in the list of tools, I couldn't find any mention of RStudio. Personally, I prefer to use RStudio to code as opposed to baseR. I did download the Cloudera Solution Kit and installed the VirtualBox but the VB is unable to connect to the internet and thus I'm unable to install it myself. How do I solve this puzzle? - will it be the same in the actual exam? even though the description states, the cluster is open to internet and you can use other software. I assume, I'm not restricted to using the r-packages in the cluster and can use the ones as part of my regular workflow. ############## Website Description ################# All CCP: Data Scientist exams are remote-proctored and available anywhere, anytime. See the FAQ for more information and system requirements. Exams are hands-on, practical exams using data science tools on Cloudera technologies. Each user is given their own 7-node, high-performance CDH5 (currently 5.3.2) cluster pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, NetBeans, scikit-learn, octave, NumPy, SciPy, Anaconda, R, plyr, dplyrimpaladb, SparkML, vowpal wabbit, clouderML, oryx, impyla, CoreNLP, The Stanford Parser: A statistical parser, Stanford Log-linear Part-Of-Speech Tagger, Stanford Named Entity Recognizer (NER), Stanford Word Segmenter, opennlp, H2O, java-ml, RapidMiner, caffe, Weka, NLTK, matplotlib, ggplot, d3py, SparkingPandas, randomforest, R: ggplot2, Sparkling water. Currently, the cluster is open to the internet and there are no restrictions on tools you can install or websites or resources you may use.

Expert Contributor

The Data Scientist exam is a proctored, remote exam.  It does not use Virtual Box or a virtual machine; it has a completely different environment.


During the exam you will have access to the Internet and can install any tool that you would like to use.