Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

What will happen to the people who took DS-200 and passed?

avatar

Hello,

 

According to http://www.cloudera.com/content/cloudera/en/training/certification/ccp-ds.html the whole procedure for being a Cloudera Certified Data Scientist has changed.

 

But an important question has not been answered:

 

  • As a person who passed the DS-200 Data Science Essentials test, but did not complete a practical challenge, what am I supposed to do?

I can read that there are 3 new exams announced but Cloudera needs to take into account the fact that people like me:

 

  1. Paid for and attended the Data Science training.
  2. Studied hard for DS-200 exam.
  3. Paid for, took and passed the DS-200 exam.

Is the Cloudera's plan to tell us something like: "Forget about the above. Wait for the new exams, and then we'll see."?

 

I think an announcement (without required details) is not enough; Cloudera needs to make an explanation, including a justification.

 

Kind regards.

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Let me try and expand a bit on the main piece of the announcement that I believe you're asking for.

- DS-200 candidates have their achievement listed as part of their public transcript and license and they can use that with employers currently. That hasn't been taken away from them. I see it listed in many places on LinkedIn and it currently has value by itself in the market. That remains.

- DS-200 holders will be given early access to the new exams and provided a discount

 

I do not, however, want DS-200 holders to be able to "skip" one of the "challenge" exams because that would violate all international testing standards, but also because I know what I see in the data for those that pass DS-200 and those that pass the challenge and I feel it's imperative to keep the hands-on requirements in place to protect the integrity of the process and the exam. And everyone who is CCP today passed 4 segments (DS-200 + 3 challenge sections).

 

Here is the section of the announcement. I'll will try and add commentary in Red

 

We have several goals behind making these changes:


  • Reduce the amount of work a candidate must complete in areas they have already demonstrated mastery


Many very qualified candidates turned in perfect or near-perfect challenge scores on parts one and two and then got called off on a project and received a 0 for part three because of circumstance and thus failed.  On the next challenge, they had to do the complete challenge the next time -- all three parts since all were somewhat integrated and doing only one twisted the validity. As such, we designed the last challenge as three discrete parts and now we're taking it further to say that you only have to test on those sections you failed. This allows the candidate more flexibility and allows those who are qualified to get through the program when it works for them, on their time rather than forcing them (a) to start over and do the complete challenge and (b) engage in a time window that works for them.

 

  • Provide year-round access to the certification, giving candidates greater flexibility. We recognize the time-bound nature of the challenge often prevented well-qualified candidates from participating.


I think my answer directly above addresses this.

 

  • Streamline the program cluster infrastructure and provide all candidates with a level exam scenario
- Streamline the grading and scoring process to provide candidates quick and relevant feedback.

We saw all kinds of submissions in the past 18 months. Some people working on laptop VMs where it took 4 weeks to run one pass on the data; some working on an Oracle BDA where it took no time. We made an assumption early on that if you can do this work (data science on very large datasets) that you have access to a cluster environment. We didn't have the ability to provide each candidate one for three months; we tried with multi-tenant providers etc. but it was less than ideal and we needed to protect security and integrity of the exam. Early on, this worked but as demand increases, more and more people want to test -- simply put, more and more people want in. Many are students who no longer have access to their university infrastructure or they work in sensitive fields and must do the test off-site to their cluster. 

 

We set out to build a performance-testing cluster infrastructure that we could scale globally for all kinds of hands-on exams. And that infrastructure is now complete (or near-complete, we're still performance-tuning) which means that anyone in the world, anywhere, with a laptop and a browser (even be a cheap chromebook) can test on a live, multi-node, very large compute instances, cloud-based, big data infrastructure. As such, we wanted to migrate the exams to a level-playing field.

 

There are many many many more reasons but let me briefly address two:

1. Predictive validity per unit of time

2. Cheating/theft

 

1. We're always evaluating for predictive validity per unit of time which looks at "how much of the objective set do we need to present a candidate to evaluate and make a predictive judgment or outcome." All exams are predictive -- a certification is Cloudera offering a predictive assessment of someone's ability in a job role. We noticed somewhat early on that the challenge format, while having lots going for it, had poor predictive abilities in the sense that people could rathole forever and we couldn't stop them from doing it. It scores almost perfectly on all kinds of psychometrics, but simply put, we could pass/fail people A LOT faster. So many people looked at the challenge problem and while the answer was "to the north" (let's say), they turned directly south and then spent 100+ hours trying to make "south" work and it was never going to. And when they got a 0 as their score (or less than 30%) they were angry and confused. As more people wanted in on the field, this issue increased substantially. 

 

The open-ended nature of the challenge was great for the ones who could pass easily -- some scored near-perfect in a few hours but others, those borderline candidates, spent inordinate amounts of time circling the wrong approach. And there are a number of reasons for that I won't go into but I know what I see in the data and the submissions. So we set out to shorten the feedback loop and tighten the predictive validity per unit of time. And we're still fine-tuning what we can cut and what we can include.

 

2. As DS-200 continued to be live, people were taking it many many times and the predictive validity decreased. It was also shared all over online. This is true of all MC tests and we were faced with enormous investment to keep it fresh and foil the cheaters. All that would be fine in isolation but combined with the next, we had to reshape a number of components.

 

Further, the challenge as we did it required a certain honor code. We gave you the challenge, we didn't proctor, we used DS-200 to gate some of that. However, we noticed that a particular company was having someone pass them the challenge and they were solving with a team of people and then posting it publicly to their blog -- basically how to solve it. It might have been coincidence and we're not going to point fingers, but it got us looking at other security issues and, sad to say, we found enough evidence to know that there were a number of entities out there predisposed to trying to bring this program down. This is unfortunate and really sad in some ways, but we realized we needed to adjust and tighten the security as much as we could. 

 

So given all the above, plus thousdands of other pieces of data, of discussions, of feedback, and knowing that the cost will increase because we have to secure it, proctor it, etc. we opted for the stated changes.

 

 


Data science is an evolving field and the technologies available to us today have evolved. As such, we felt that now was a good time to adapt the program to meet the changing needs of the community and to encourage ongoing professional growth and development in this exciting field.

 

 

 

 

 

View solution in original post

3 REPLIES 3

avatar
Master Collaborator

I'll reply in two posts, the first will just be the announcement since the full announcement isn't referenced. The second will be an opinion on the rationale:

 

Here's the full announcement:

Cloudera is making several changes to its CCP: Data Scientist certification program, effective immediately.
 
We are lifting the requirement that a candidate pass the DS-200: Data Science Essentials written test before being allowed to move on to the challenge. We will no longer offer either DS-200 or the DS-200 practice test.
 
We are making the following changes to the Data Science Challenge: 
- Cloudera will now provide candidates a fully configured CDH-based data science cluster rather than requiring or allowing candidates to provide their own.
- The Challenge is composed of three sections; each is now offered as a separate exam, and students must take and pass all three in order to achieve the qualification of CCP: Data Scientist. Unlike the current Challenge, though, students need only retake any specific exams they fail, rather than having to complete a full new challenge. The exams may be taken in any order. The exams are:
- DS700 - Descriptive and Inferential Statistics
- DS701 - Unsupervised Machine Learning
- DS702 - Supervised Machine Learning
- The exams will remain fully hands-on on a live cluster with large datasets but will now include remote monitoring by a proctor via webcam, video, audio, and in a secured cluster environment. Candidates may take the exam at their home or office location; attendance at a testing center is not required.
- The three exams can be taken in any order and at anytime as long as all three are passed within 365 days of each other. Candidates who fail an exam will need to pay to retake that exam; no free retakes will be offered.
- The new exams will be offered beginning July 2015. At that time, we will publish the cluster specifications, what tools candidates may install.
 
We have several goals behind making these changes:
- Reduce the amount of work a candidate must complete in areas they have already demonstrated mastery
- Provide year-round access to the certification, giving candidates greater flexibility. We recognize the time-bound nature of the challenge often prevented well-qualified candidates from participating.
- Streamline the program cluster infrastructure and provide all candidates with a level exam scenario
- Streamline the grading and scoring process to provide candidates quick and relevant feedback.

Data science is an evolving field and the technologies available to us today have evolved. As such, we felt that now was a good time to adapt the program to meet the changing needs of the community and to encourage ongoing professional growth and development in this exciting field.

avatar
Master Collaborator

Let me try and expand a bit on the main piece of the announcement that I believe you're asking for.

- DS-200 candidates have their achievement listed as part of their public transcript and license and they can use that with employers currently. That hasn't been taken away from them. I see it listed in many places on LinkedIn and it currently has value by itself in the market. That remains.

- DS-200 holders will be given early access to the new exams and provided a discount

 

I do not, however, want DS-200 holders to be able to "skip" one of the "challenge" exams because that would violate all international testing standards, but also because I know what I see in the data for those that pass DS-200 and those that pass the challenge and I feel it's imperative to keep the hands-on requirements in place to protect the integrity of the process and the exam. And everyone who is CCP today passed 4 segments (DS-200 + 3 challenge sections).

 

Here is the section of the announcement. I'll will try and add commentary in Red

 

We have several goals behind making these changes:


  • Reduce the amount of work a candidate must complete in areas they have already demonstrated mastery


Many very qualified candidates turned in perfect or near-perfect challenge scores on parts one and two and then got called off on a project and received a 0 for part three because of circumstance and thus failed.  On the next challenge, they had to do the complete challenge the next time -- all three parts since all were somewhat integrated and doing only one twisted the validity. As such, we designed the last challenge as three discrete parts and now we're taking it further to say that you only have to test on those sections you failed. This allows the candidate more flexibility and allows those who are qualified to get through the program when it works for them, on their time rather than forcing them (a) to start over and do the complete challenge and (b) engage in a time window that works for them.

 

  • Provide year-round access to the certification, giving candidates greater flexibility. We recognize the time-bound nature of the challenge often prevented well-qualified candidates from participating.


I think my answer directly above addresses this.

 

  • Streamline the program cluster infrastructure and provide all candidates with a level exam scenario
- Streamline the grading and scoring process to provide candidates quick and relevant feedback.

We saw all kinds of submissions in the past 18 months. Some people working on laptop VMs where it took 4 weeks to run one pass on the data; some working on an Oracle BDA where it took no time. We made an assumption early on that if you can do this work (data science on very large datasets) that you have access to a cluster environment. We didn't have the ability to provide each candidate one for three months; we tried with multi-tenant providers etc. but it was less than ideal and we needed to protect security and integrity of the exam. Early on, this worked but as demand increases, more and more people want to test -- simply put, more and more people want in. Many are students who no longer have access to their university infrastructure or they work in sensitive fields and must do the test off-site to their cluster. 

 

We set out to build a performance-testing cluster infrastructure that we could scale globally for all kinds of hands-on exams. And that infrastructure is now complete (or near-complete, we're still performance-tuning) which means that anyone in the world, anywhere, with a laptop and a browser (even be a cheap chromebook) can test on a live, multi-node, very large compute instances, cloud-based, big data infrastructure. As such, we wanted to migrate the exams to a level-playing field.

 

There are many many many more reasons but let me briefly address two:

1. Predictive validity per unit of time

2. Cheating/theft

 

1. We're always evaluating for predictive validity per unit of time which looks at "how much of the objective set do we need to present a candidate to evaluate and make a predictive judgment or outcome." All exams are predictive -- a certification is Cloudera offering a predictive assessment of someone's ability in a job role. We noticed somewhat early on that the challenge format, while having lots going for it, had poor predictive abilities in the sense that people could rathole forever and we couldn't stop them from doing it. It scores almost perfectly on all kinds of psychometrics, but simply put, we could pass/fail people A LOT faster. So many people looked at the challenge problem and while the answer was "to the north" (let's say), they turned directly south and then spent 100+ hours trying to make "south" work and it was never going to. And when they got a 0 as their score (or less than 30%) they were angry and confused. As more people wanted in on the field, this issue increased substantially. 

 

The open-ended nature of the challenge was great for the ones who could pass easily -- some scored near-perfect in a few hours but others, those borderline candidates, spent inordinate amounts of time circling the wrong approach. And there are a number of reasons for that I won't go into but I know what I see in the data and the submissions. So we set out to shorten the feedback loop and tighten the predictive validity per unit of time. And we're still fine-tuning what we can cut and what we can include.

 

2. As DS-200 continued to be live, people were taking it many many times and the predictive validity decreased. It was also shared all over online. This is true of all MC tests and we were faced with enormous investment to keep it fresh and foil the cheaters. All that would be fine in isolation but combined with the next, we had to reshape a number of components.

 

Further, the challenge as we did it required a certain honor code. We gave you the challenge, we didn't proctor, we used DS-200 to gate some of that. However, we noticed that a particular company was having someone pass them the challenge and they were solving with a team of people and then posting it publicly to their blog -- basically how to solve it. It might have been coincidence and we're not going to point fingers, but it got us looking at other security issues and, sad to say, we found enough evidence to know that there were a number of entities out there predisposed to trying to bring this program down. This is unfortunate and really sad in some ways, but we realized we needed to adjust and tighten the security as much as we could. 

 

So given all the above, plus thousdands of other pieces of data, of discussions, of feedback, and knowing that the cost will increase because we have to secure it, proctor it, etc. we opted for the stated changes.

 

 


Data science is an evolving field and the technologies available to us today have evolved. As such, we felt that now was a good time to adapt the program to meet the changing needs of the community and to encourage ongoing professional growth and development in this exciting field.

 

 

 

 

 

avatar
Master Collaborator

eh...I can't get the html red font color to stick. hopefully my additions are clear