Support Questions
Find answers, ask questions, and share your expertise

CCA 175 Certification - Study Material

New Contributor

Hi,

 

I want to take up new CCA175 certification sometime in January. Can someone provide details on Study materials i can refer to prepare for this exam. Since this certification is pretty new, I don't see much references on google.

2 ACCEPTED SOLUTIONS

Super Collaborator
Our Cloudera Developer Training for Spark and Hadoop  teaches you everything you need to know for the exam and gives you days worth of hands-on practice. We both train and test the same objectives (although we train on more of course) If you do not wish to take our training or cannot afford it, there are hundreds of free resources on the internet.

For example, if I point my browser to the List of objectives from our website. I then copy the first objective and search on it, I get dozens of free docs and training on the skill.

• Import data from a MySQL database into HDFS using Sqoop
• s qoop.apache.orgdocs on importing data using Sqoop • Importing Data into HDFS using Sqoop
• Video demonstration of user importing data from MySQL to HDFS using Sqoop

And there are dozens more. As the exam page also tells you that you have access to some of these during the exam, I suggest becoming intimate with the those.

If you take some time to search each objective and learn the skill, the exam will be easy. If you don’t want to take the time to learn on your own, and you’re not learning on your job, then training is your answer. This is always true, of all technical skills, not just this one exam. And part of the learning is figuring out which free resources are the best if that’s the route you choose to go, or which paid ones are.

As for the style of question, it’s quite straightforward: for the above objective, you’ll be given the location of a MySQL database on one of the nodes and you’ll be required to use Sqoop to import a portion or all of that data from the MySQL instance we give to the instance of HDFS we give you. It's really that straightforward if you take the time to learn the skills.

View solution in original post

Expert Contributor

The CCA-175 exam is a hands-on, scenario-driven test.  You will be asked to solve problems.  The grading will be based on the solutions that you provide.  We do not evaluate the tools or the code that you used to solve the problems.

 

Can you use Pig?  Absolutely.  It may be possible to solve every one of the problems on the exam using just Pig if you are an expert.  It may not be practical, however.  There may be better tools for interacting with the Hive metastore to do DDL, such as Impala, Hive, HUE, HCatalog, etc. Similarly, the coding questions will give you Spark templates to add code to.  You may not have enough time in the exam to code everything from scratch in Pig.  Pig is not listed in the Required Skills, because there will not be a specific question where you are required to use Pig on the exam.

 

The Required Skills section contains this line:

  • Use Data Definition Language (DDL) to create tables in the Hive metastore for use by Hive and Impala.

 

There is no contradiction.  This is a "problem solving exam" as opposed to a "tools exam".  You will need to be able to use Hadoop tools to generate solutions, but the tool that you use is up to you.

View solution in original post

25 REPLIES 25

New Contributor

Hi,

I am also searching for some guidence and material to write at end of january. Please let me know if you have any updates. 

 

Uma

New Contributor

Hi, Were you guys able to find any study meterial for this exam? 

New Contributor

Any update on this?

Expert Contributor

The site for the exam is here:  http://www.cloudera.com/content/www/en-us/training/certification/cca-spark.html

 

It is a hand-on exam, so there is not a book that is going to give you all of the answers.  The course Cloudera Developer Training covers the topics as well as giving you hands-on exercises to practice.

New Contributor

where we can find, study material? LIke what needs to studied by us? and what kind of questions etc? 

New Contributor

@juddimal wrote:

The site for the exam is here:  http://www.cloudera.com/content/www/en-us/training/certification/cca-spark.html

 

It is a hand-on exam, so there is not a book that is going to give you all of the answers.  The course Cloudera Developer Training covers the topics as well as giving you hands-on exercises to practice.



That sounds like we can not be confident unless we sign up for Cloudera's training. I welcome the performance based skill testing method but Cloudera should not force us to signup for their training by not spill the beans.

 

I also need to know if we require to code in Scala AND pyhon for all the problems. 

Super Collaborator
Our Cloudera Developer Training for Spark and Hadoop  teaches you everything you need to know for the exam and gives you days worth of hands-on practice. We both train and test the same objectives (although we train on more of course) If you do not wish to take our training or cannot afford it, there are hundreds of free resources on the internet.

For example, if I point my browser to the List of objectives from our website. I then copy the first objective and search on it, I get dozens of free docs and training on the skill.

• Import data from a MySQL database into HDFS using Sqoop
• s qoop.apache.orgdocs on importing data using Sqoop • Importing Data into HDFS using Sqoop
• Video demonstration of user importing data from MySQL to HDFS using Sqoop

And there are dozens more. As the exam page also tells you that you have access to some of these during the exam, I suggest becoming intimate with the those.

If you take some time to search each objective and learn the skill, the exam will be easy. If you don’t want to take the time to learn on your own, and you’re not learning on your job, then training is your answer. This is always true, of all technical skills, not just this one exam. And part of the learning is figuring out which free resources are the best if that’s the route you choose to go, or which paid ones are.

As for the style of question, it’s quite straightforward: for the above objective, you’ll be given the location of a MySQL database on one of the nodes and you’ll be required to use Sqoop to import a portion or all of that data from the MySQL instance we give to the instance of HDFS we give you. It's really that straightforward if you take the time to learn the skills.

New Contributor

Thanks for reply Brad, good to hear from Cloudera folks.

 

Like you pointed out importing data from MySQL sounded really simple to me and thats what made me wonded "what else is in there ?"

 

I'm a self learner, relying on Coursera(for Scala, Python), Edx(Spark) and Cloudera VM. By chance will there be some sample excercie on Clouder VM in future release, it will help developer community a lot since all of us don't get an opportunity to work on all of the exam objectives at work/school.

 

And about my other question, do I need to write program in Scala and Python ? I'm good @Python and just getting started with Scala.

Super Collaborator
yes, Scala and Python.

New Contributor

when we answer a question in exam and compile(submit) it, does it tell you whether it was correct or not (I mean for individual questions not for entire exam)??

New Contributor

Thanks for clarifying on these questions. Do we need to know both Scala and Python? Is Scala alone not enough for coding?

Community Manager

From the community FAQ on certification

 

The answer is yes, there are questions using both languages.

 

However, please remember that the goal of the exam is to test your Spark knowledge, not your Scala and Python knowledge.  The development questions typically provide you some code and ask you to fill in TO-DO sections.  So the key is to understand the Spark API.  You must have some knowledge of programming, as you will need to be able to read the existing code and understand how to store and retrieve the results you get back from calling the API, but the focus will be on you adding the Spark calls.


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

New Contributor

Bumping this slightly to get more up to date answers on study materials. There's a book on amazon for CCA 175 which looks decent but seems to be a cheat sheet more than anything else.

 

Other certifications, e.g. SAS advanced cert, do offer a study guide too. It's a shame that Cloudera don't do that.

Explorer

Its good that the "Required Skills" section is pretty specific but there is some contradiction.

Under the "Exam Question Format" it mentions impala as copy pasted below

"In some cases, a tool such as Impala or Hive may be used. "

 

but "required skills" section make no mention of impala.

 

Similarly, "required skills" section dont mentio pig at all. Does it mean pig will not be asked in the exam? however, pig is included in the list of resources available on exam cluster.

 

The tools mentioned in the "required skills" section are:

MYSQL

HDFS

Sqoop

Flume

Hadoop fs

spark

hive meta store

avro

json files

 

and you can see the list do not include pig or impala.

Expert Contributor

The CCA-175 exam is a hands-on, scenario-driven test.  You will be asked to solve problems.  The grading will be based on the solutions that you provide.  We do not evaluate the tools or the code that you used to solve the problems.

 

Can you use Pig?  Absolutely.  It may be possible to solve every one of the problems on the exam using just Pig if you are an expert.  It may not be practical, however.  There may be better tools for interacting with the Hive metastore to do DDL, such as Impala, Hive, HUE, HCatalog, etc. Similarly, the coding questions will give you Spark templates to add code to.  You may not have enough time in the exam to code everything from scratch in Pig.  Pig is not listed in the Required Skills, because there will not be a specific question where you are required to use Pig on the exam.

 

The Required Skills section contains this line:

  • Use Data Definition Language (DDL) to create tables in the Hive metastore for use by Hive and Impala.

 

There is no contradiction.  This is a "problem solving exam" as opposed to a "tools exam".  You will need to be able to use Hadoop tools to generate solutions, but the tool that you use is up to you.

New Contributor

Hi Brad,

As for spark test part, can I choose either Python or Scala for exam? Or does I have to learn both Python and Scala to pass the exam?

 

Thanks

 

--Judith

New Contributor

Hi Brad,

I took the Developer Training for Spark and Hadoop last month. And I am planning for CCA175. But when I look at the skillset here -http://www.cloudera.com/training/certification/cca-spark.html , I see the Data Analysis part was not covered in the course, and instead Kafka and Flume has been covered in the course.
Can you please confirm if the CCA175 will include the updated content (wihth Kafka, Flume) or it still will include the Hive, Impala? And how long will the CCA175 include Data Analysis part? PLease advice as I am planning for certification soon.

Thanks
Priti

New Contributor

Hi Brad, 

 

I am planning to give my cloudera CCA 175 exam soon. I found that there is very little mentioned about Hive (topics like paritioned/bucket tables)  or Avro (topics like Schema evolution etc) in the objectives on this site https://www.cloudera.com/more/training/certification/cca-spark.html. 

 

 

Does this mean there will be no question on the advance topics in Hive or Avro in the exam ?

Community Manager

I looked over the CCA175 page and while there is little mention of Hive or Avro one particular section stands out.

 

Exam Question Format

Each CCA question requires you to solve a particular scenario. In some cases, a tool such as Impala or Hive may be used. In other cases, coding is required. In order to speed up development time of Spark questions, a template is often provided that contains a skeleton of the solution, asking the candidate to fill in the missing lines with functional code. This template is written in either Scala or Python.

You are not required to use the template and may solve the scenario using a language you prefer. Be aware, however, that coding every problem from scratch may take more time than is allocated for the exam.

 

Perhaps you are looking at the exam as individual questions rather than tasks to complete. I would look over the required skills portion of that page and think about how you would accomplish each task. If Hive and Avro would be involved in your process to comlete a task, then prepare accordingly. 

 

I hope this helps. 


Cy Jervis, Manager, Community Program
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

New Contributor

You mark this as solved ... But, that is not correct. It is not solved...

 

We understand Cloudera Training provides all the necessary information for this exam... 

 

Any certification should provide study materials...

 

Especially when the training is not mandatory and also for those who don't want to take an expensive training (not all will be able to take it) and may want to use a study material and do the hands-on on their own to get this certification...

 

It will be great If Cloudera could provide a study material (paid one) for this course (I am not talking about a free study material)... 

 

Thanks.

; ;