12-10-2015 07:21 AM
02-01-2016 08:17 AM
I totally agree. I've went throught the lestest reference list. It's actually a 2-year-of-work, if you treat every entry seriously.
If there is any more precise guidance ?
The url of blogs, who have passed the new exams, doesn't work, by the way.
02-01-2016 08:34 AM
The main page: http://cloudera.com/training/certification/ccp-ds.html
Certified data scientists: http://cloudera.com/training/certification/ccp-ds/class.html
It is the most elite certification in the field, with a few dozen in the world passing. I would think that knowing everything you need to know in order to be a Data Scientist is more than a two year endeavor.
02-01-2016 08:58 AM
07-25-2016 12:07 PM
I have been going through DS200 Solution Kit and this forum to understand the objectives of new CCP-DS. I am a little disheartened to know that the content of the new exams is same as that of the old exam.
Since then, I have been a little curious to find out what tools are used for datascience and machine learning in CCP-DS exam.
As for the DS 200 Solution Kit, the emphasis is on
Data Exploration of JSON Datasets uses MapReduce Streaming (Python)
Data Cleaning of JSON Datasets uses MapReduce Streaming (Python)
Classifiying using Simlink Algorthm
Clustering using Cloudera ML
A Recommender system is built using Mahout
But I observed that many organizations, including mine have started using more and more of Spark RDD's and Dataframes for Data Exploration, Cleaning and Transformation using Spark's scala or Python API's.
Most of the common Machine learning techniques such as can classifictaion, clustering and Collaborative filtering can be implemented using Spark mllib or using H20 on spark.
Also the CCP-DS certification page at http://www.cloudera.com/training/certification/ccp-ds/exams.html
lists "Data Science at Scale Using Spark and Hadoop" as one of the study resources. This makes me more curious and just wanted to know,
Can I use Spark RDD or Dataframes (Scala / Python API) & Spark mllib in the actual CCP-DS exam ?
Is Spark Scala API supported or should I use only Spark Python API ?
Also can I use Python with scikit & Pandas to solve some inferential and decriptive statistic problems in the exam?
Can the resource guide be updated to include alternative technologies such as Spark istead of MapReduce & Mahout ? If not I would say its such a fumble to start learning Mahout from the scratch after learning and implementing ML problems in Python scikit or R all these years and lately moving on to Spark mllib. I woul bet there would be several datascience / ML folks who will be in the same boat as mine.
Any help or answer in this regard would be highly appreciated and will recieve a big welcome by many Datascience folks looking out for a similar answer in their endeavour to achieve a CCP-DS certification.