The question that us see most often, here and elsewhere, concerning the CCA175 is "Do I need to know both Scala and Python?" The answer is yes, there are questions using both languages.
However, please remember that the goal of the exam is to test your Spark knowledge, not your Scala and Python knowledge. The development questions typically provide you some code and ask you to fill in TO-DO sections. So the key is to understand the Spark API. You must have some knowledge of programming, as you will need to be able to read the existing code and understand how to store and retrieve the results you get back from calling the API, but the focus will be on you adding the Spark calls.
There is not a study guide available for this exam, but Cloudera tells us exactly what we will be tested on. For example, the Required Skills at http://www.cloudera.com/content/www/en-us/training/certification/cca-spark.html tell us that we need to be able to join disparate datasets together using Spark. So go look up the join command in Spark. Write a quick five line program in Scala. Then write one in Python. Also, browse the online documentation so you know where to look up the API during the exam (remember you will not have access to a search engine during the exam). Putting in effort before the exam will save you a lot of effort during the exam.
Can you tell which version of Spark we would be tested on? I know that CDH 5.3.2 comes with Spark 1.4 but is it possible that when I take the exam the version of CDH is upgraded and I have option to use newer version of Spark (1.6)?
The current cluster is CDH5.3.2. It supports Spark 1.2.0.
Changes to the version of the cluster will be announced on the Cloudera certification webpages.
An update was made to the Cloudera website to clarify the use of programming languages.
Exam Question Format
Each CCA question requires you to solve a particular scenario. In some cases, a tool such as Impala or Hive may be used. In other cases, coding is required. In order to speed up development time of Spark questions, a template is often provided that contains a skeleton of the solution, asking the candidate to fill in the missing lines with functional code. This template is written in either Scala or Python.
You are not required to use the template and may solve the scenario using a language you prefer. Be aware, however, that coding every problem from scratch may take more time than is allocated for the exam.
Impala is not given as part of syllabus. Can you please clarify? Also can you pelase tell me me the exact version of softwares used in the distribution during the exam. CDH 5.3.2 is no where to be found on the site. Please help.
> Impala is not given as part of syllabus.
Cloudera has everything about the exam at: http://www.cloudera.com/training/certification/cca-spark.html
Impala is part of the cluster and can be used during the test. I am not sure what syllabus you are looking at, but Impala is mentioned four times on this page.
> Also can you pelase tell me me the exact version of softwares used in the distribution during the exam.
The exact version of the cluster is CDH 5.3.2
> CDH 5.3.2 is no where to be found on the site.
The link you are looking for is six inches higher in this subject: http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_vd_cdh_package_previous.html#conc...
thanks for the reply, so based on what I have read, there wont be any need to write java code (for mapreduce or any other purposes) in the exam anymore. correct? I understand we have the option to write code in java but we will need to write that from the scratch which will take longer as the exam will already include skeleton code in paython and/or scala.
Do we need both, scala and python.
This means we need to write same program in scala and python both ?
It would be great to have some practice exam environment or sand box.
As i have seen this in website "This template is written in either Scala or Python." according to this we can choose language, is this correct?