Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

DS 200 Challenge Pre-Requisite

Highlighted

DS 200 Challenge Pre-Requisite

I just took the practice test for DS-200, I got a 38% despite studying for four months.  The recommended reading did not help me pass the exam, many of the questions required knowledge that was not discussed in the recommended reading, and used terminology I never saw in the recommended reading.

 

Though I could rant for hours about how frustrated I am with cloudera, that is not why I'm writing.  The DS-200 challenge packet is open on March 31st, and I know I need to pass the exam before starting the challenge.

 

What's not clear to me from your website, is if I need to pass the exam before the challenge packet is open, or before I start.  In other words, if I pass the exam in the end of April, will I be allowed to start the current challenge packet?  Or will I be forced to wait until the next one?

7 REPLIES 7
Highlighted

Re: DS 200 Challenge Pre-Requisite

Super Collaborator

@michaelscottknapp wrote:

I just took the practice test for DS-200, I got a 38% despite studying for four months.  The recommended reading did not help me pass the exam, many of the questions required knowledge that was not discussed in the recommended reading, and used terminology I never saw in the recommended reading.

 

Though I could rant for hours about how frustrated I am with cloudera, that is not why I'm writing.  The DS-200 challenge packet is open on March 31st, and I know I need to pass the exam before starting the challenge.

 

What's not clear to me from your website, is if I need to pass the exam before the challenge packet is open, or before I start.  In other words, if I pass the exam in the end of April, will I be allowed to start the current challenge packet?  Or will I be forced to wait until the next one?


Michael,

from this page, the pre-requisitie for registration is completing DS-200: http://cloudera.com/content/cloudera/en/training/certification/ccp-ds/challenge/register.html

 

It doesn't address your specific question on timing: yes, you have to pass DS-200 before you can register for the challenge; if you pass DS-200 in April, and a challenge is open, you can register. You will have less time but no one necessarily uses or needs the entire time. Some passed challenge one in 8 hours, some spent 40+ hours. Skills, background, etc. vary (widely) in such a new field. It's your call on how much time you're willing to give it. 

 

As to your frustration (you say at Cloudera, but I assume you mean the CCP:DS program so I'll address that). And I'm looking at our database of scores and I don't see yours so I believe the 38% is on the practice test, not DS-200. 

 

Anyway, to your frustration: I hear your frustration and as the program manager, I get a lot of anger directed my way about data science in particular, and it is something that keeps my awake ..well it's after 8pm here so it keeps me at my desk. I can only offer you my perspective.

 

It's a new field, and not a particularly easy one. When we have people working in the field (whom we use as subject-matter experts) take DS-200, they score nearly perfect scores, and they breeze through the challenge in short order. And people outside that arena of subject-matter expertise really really struggle. I mean struggle -- the distance is often striking. For example, one person started at Cloudera and asked to take DS-200: scored nearly perfect on DS-200 (missed a single item) and finished the challenge in 8 hours. Others who came to the CCP:DS challenge one had a very, very different experience, some to the point of anger surpassing anything I've experienced in 23 years. And yet, when I look over the hundreds of scores, I see a realy good discrimation index and point-by-serial correlation between items and scores and candidates and performance. The numbers all look really good to me on these tests; and yet, there's a lot people just plain frustrated by the field.

 

 

Add to that, wrt to data science:  it's not Java or Linux Admin, where a broad range of people have years of knowledge and exposure and familiiarity with the idiom and the nuance. And the questions that we ask often exploit that; this is what makes a good assessment, that it discrimates in favor of people with lots of hands-on experience. We ask lots of questions that tend to bank-shot off an idea into something more nuanced.

 

That said, if you come at it from a study perspective and not an experience perspective, wanting to learn, it's really opaque at times. I struggle with this because given that most certifications are pretty easy...take a practice test, study a bit, and nail it, people come to CCP and bump into that chaos and it can ignite some heat.

 

We do want to grow the community without sacrificing the high standards of the CCP program. And we're working in that direction as fast as we're able to move given our resources. We released version 1 of the practice test and will continue to update it -- adding items, adding depth, adding clarity.  We recently released a solution guide (process and approach) to challenge one and we're working on how better to grow the community and provide those resources to help train up people.

 

I say all this not to discount your criticism or try and invalidate how you feel. I've made a very conscious decision to make this program hard -- perhaps the hardest in the industry. The bar is: we'd hire you or we'd recommend you to our customers. Full stop. That said, at this time, I'm not sure we can transmit or teach you everything we're going to throw at you. We're going to keep working on materials to grow the community -- soltutions, approaches, webinars, training classes. And if we've made this more confusing than it should be, I own clarifying and cleaning it up. And I welcome an ongoing discussion with the community in ways to do that. 

 

 

 

 

Highlighted

Re: DS 200 Challenge Pre-Requisite

Super Collaborator
and I have no idea why the board turns the program name into an emoticon...I guess a colon followed by a D = some other thing. I keep using CCP_colon_DS and it shows up (for me) as a smily or angry face.

Re: DS 200 Challenge Pre-Requisite

BTW, I think what cloudera should seriously think about doing is adding more certifications.  Specifically, I would add one similar to the data science certification, but without all of the machine learning material.  A cert that challenges people to solve a real-world problem using hadoop and lots of related technologies, but no machine learning stuff.  The hadoop core certification is like playing tee-ball, and the DS cert is like playing in the world series.  Can't we have something in between?

Highlighted

Re: DS 200 Challenge Pre-Requisite

Expert Contributor

Ask and ye shall receive.  The next step for the certification program is to do exactly that.  We will introduce a developer certification that is a similar format to the data science certification, where you'll have to pass the exam and then do a practical.  The practical will be much more in line with what you were expecting from the data science certification.  The general idea will be, there's data over there, there, and there.  Get it, combine it, transform it like so, so, and so, and stick it over there -- Go!  There are still details to be fleshed out, but that's the general plan.  Look for that later this year.

Highlighted

Re: DS 200 Challenge Pre-Requisite

Brad,

Thanks for the thorough reply, I'm glad to know that I can start the challenge packet late if I have to.

Yes it was the practice test I took, not the real exam.

Saying I'm frustrated with Cloudera, not just the DS-200 cert, is not a mistake.  I'm frustrated with cloudera for a lot of reasons, but I think I should start with the test related gripes first.

I understand that you want this certification to be challenging, and I can't blame you for that, but that doesn't excuse you from misleading people.  There are a lot of things on the study guide that I spent weeks studying, and it didn't help me at all.  That being said, let me tell you what I read:
- Hadoop, the definitive guide, version 2: chapters 1-8
- Hadoop in Practice: most of chapters 1-5 (techniques 1-27)
- HBase in Action: chapters 1-4
- Mahout in Action: chapters 2,3,4,7,8,9,13,14,15 (pretty much everything except "take it to production" chapters)
- Collective Intelligence: 2,3,6,7,8,9,10,12
- Algorithms of the Intelligent Web: chapters 3,4,5
- online documentation for pig and hive

Very few of the questions on the practice exam could have been answered from those books alone.

I am no idiot, and I read all of that, but it only gave me 38% on the test!  What I read is probably 90% of what you recommended I read.  The one thing I skipped was the "Pattern Recognition and Machine Learning" book.  You only tell people they need to read the first two chapters.  Looking over the test questions now, I think the vast majority of material came from that book.  It also looks like just the first two chapters are not enough, I would have to read many other chapters from that book.

I read the recommended reading for data visualization, it didn't actually teach me anything, it seems more like it's meant to inspire than teach.  Indeed there were questions on the test that I couldn't answer, specifically: I didn't know what a "box plot" or "tree map" were, or at least I never called them by that name.

Fortunately I wrote down everything I saw on the test but did not see in any of the recommended reading:
- logistic regression**
- Convex functions**
- Beta distribution*
- Chi2 distributions***
- SVM, the Collective Intelligence book mentions it very briefly but gives almost no explanation of how it works.  Its explanation did not help me answer the test question
- PCA** (chapter 12 of Pattern Recognition)
- QR decomposition***
- sentiment analysis***
- outlier detection
- laplace smoothing***
- Lloyd's algorithm***
- eigenvectors***
- L1, L2***
- Hidden Markof Models** (chapter 13 of Pattern Recognition)
- Simpson's paradox***
- support vector count (I heard of SVMs, but not their count, like I said the Collective Intelligence book barely mentions them)
- Area under ROC curve***
- underflow, overflow***
- "cost of clustering", do you mean the RAM and time requirements?
- "how well it fits the data"  I don't recall reading about any measurement of that.
- "least mutual information"
- questions like "how many points do you need in a sample based on number of attributes" not covered in any of the assigned reading.
- there was a question about impala, though there is no recommended reading on that.
- natural language processing

* only in the Pattern Recognition book
** it's in the Pattern Recognition book, but not the first two chapters
*** I can't find it in ANY of the assigned reading

Above I listed everything I saw on the test but not in the assigned reading.  Below I list things you recommended reading about, but that were not on the test:

- There was not a single question on avro
- not a single question on pig (though there were two on hive)
- not a single question on flume
- only two questions about map-reduce
- no questions about HDFS, sequence files, avro files
- no questions about compression using snappy, deflate, etc.  
- no questions about wget or curl
- no questions about hadoop streaming
- no questions about python frameworks for hadoop
- no questions about mahout!

Your study guide is very misleading, you need to update that and tell people what they REALLY should be studying.  Give people an idea about how much weight will be put on each topic in the test.  Remove those recommended reading items that don't actually help people.

It seems like this test is more geared towards people with a very advanced mathematical background.  I thought it was going to be more about programming than math.

Here are the other reasons I'm frustrated with cloudera:

- The data science training class was absolutely useless for me, I don't think I learned anything from that, but spent thousands of dollars on it, lost compensation from work, and wasted three days of my life on it.
- your whole website is one of the most difficult to use I have ever encountered.  It is really difficult to navigate.
- I have a lot of trouble finding information and links on your website.  I still can't figure out how to get to my profile without clicking the link from my email.
- Often times I sign in to the web site, navigate around, and then discover that I am not signed in any more.  This is really annoying because your website keeps prompting to enter my personal information when I want to download something, or view training.  When I sign in again, now I can't find the page I was on.
- on more than one occasion I have posted something to the community and it was lost, usually because I was asked to sign in after posting, and cloudera lost my post.  I learned to write posts separately now.
- Website aside, I feel like your documentation is confusing and tough to follow.  You have a manual to install a bunch of your products but no instructions on how to use them.  If people follow the instructions from external sites (hadoop, hive, pig, etc.), they don't work because you have customized things so much.
- During the practice test, it was impossible to re-visit questions I already answerred.
- One question on the practice test was word for word exactly the same as the previous question, but had a different answer.
- You used a different terminology during the practice test from what was used in the books.  It made things confusing.
- A lot of the recommended reading was out of date, and won't work with CDH 4.5.

Honestly, if I had a choice then I would not be working with cloudera, but there is no alternative.

Highlighted

Re: DS 200 Challenge Pre-Requisite

Expert Contributor

Michael,

 

I think you hit the nail on the head: "I thought it was going to be more about programming than math."  It sounds like we need to do a better job at making that clear.  Data science should be more about math than programming.  The data scientist certification is in no way 'the next step' after the developer certificaiton.  It presupposes a solid background in statistics, linear algebra, calculus, and machine learning.  That required reading list is written largely from that perspective.  I'll work with Brad to make that clearer.

 

The items in your list of things you didn't learn from the suggested reading fall into two categories: things that you would learn in a machine learning or statistics program and red herrings that don't actually exist, which goes back to the machine learning or statistics background to recognize them as such.

 

With respect to the certification, our introduction to data science class is really more of a teaser than sufficient education.  It tells you the sorts of things you'll need to go out and learn in order to have a shot at passing the certification, but there's no way on earth we could cram that much content into three days with no real prerequisites and have anyone survive it, much less remember it all.

 

We'll pass your concerns about the website on to the marketing team.  I agree that there's room for the navigation and overall user experience to be better.

 

On the two questions that were identical, if you want to drop an email with details, I'll look into it.  I suspect I know what it was, though.  There are a couple of questions in the practice pool that are the same question but with a completely different list of possible answers.  It's like asking, "which of these is a reptile?"  The question could have many different multiple choice lists, each with a different correct answer, (dog,cat,chicken,lizard), (horse,snake,pig,rabbit), etc.

 

Daniel

Highlighted

Re: DS 200 Challenge Pre-Requisite

New Contributor

I think what Cloudera ought to genuinely consider doing is including more accreditations. In particular, I would add one like the information science accreditation, yet without the majority of the machine learning material.

I love idioms and Cloudera.
Don't have an account?
Coming from Hortonworks? Activate your account here