Created 05-27-2017 05:06 AM
Hi,
I am CCA Spark and Hadoop Certification aspirant.As per the revised syllabus for CCA 175 do we need to prepare Spark with Scala as well as Python or Spark with Scala is enough for the Certification?
Previous Syllabus mandate Spark with Scala and python but new Syllabus does not provide clarity on the same.
Created 05-30-2017 05:13 AM
@ashishdiwan03 wrote:
Hi All,
I had email conversation with cloudera certification about same query below is the response i got from cloudera certification team,
During the exam, it is possible to get templates so that all coding is not done from scratch.In order to speed up development time of Spark questions, a template is often provided that contains a skeleton of the solution, asking the candidate to fill in the missing lines with functional code. This template is written in either Scala or Python.
According to certification team knowldge of Scala and python is must for the examination.
To clarify a bit on the Scala and Python questions here is a snippet from our community knowledge article Cloudera Certifications FAQ:
Q - Do I need to know both Scala and Python for the CCA Spark and Hadoop Developer (CCA 175) Certification?
A - The answer is yes, there are questions using both languages.
However, please remember that the goal of the exam is to test your Spark knowledge, not your Scala and Python knowledge. The development questions typically provide you some code and ask you to fill in TO-DO sections. So the key is to understand the Spark API. You must have some knowledge of programming, as you will need to be able to read the existing code and understand how to store and retrieve the results you get back from calling the API, but the focus will be on you adding the Spark calls.
Created 06-07-2017 10:59 AM
You need to apply scala plug-in in Eclipse (Scala Perspective). Scala is not a default one
Created 06-07-2017 03:02 PM
@saranvisa, yeah sure, no probleam with that.
My question is if in the cluster available for the test, the plugin will come installed,
because i think that external access will not be available, isn't?
Created 06-08-2017 07:38 AM
You don't need to use IDEs like eclipse, InteliJ, etc... also no need to create jar as part of the exam. You can login as spark-shell (or) pyspark and execute your commands one by one. All they need is your result and this will save your time
Created 01-30-2018 03:31 AM
I understand this is an old thread, hopefully my reply here will still get attention.
My questions are:
1. The above discussion is confusing: the "marked" answer says template will be provided (not mandate to be used), the other user says multiple channels confirmed to him/her that no template was provided in the actual exam, which is true?
2. Is Flume excluded from the exam?
3. Is Internet browsing like Google allowed in the exam? I ask this question because often the case we come to the need to check documentation user guide for each component like Spark, Hive, etc.
Thank you.
Created 01-30-2018 05:34 AM
Hi @axie,
If you look at the CCA175 certification page you will see the following wording on templates
Each CCA question requires you to solve a particular scenario. In some cases, a tool such as Impala or Hive may be used. In other cases, coding is required. In order to speed up development time of Spark questions, a template is often provided that contains a skeleton of the solution, asking the candidate to fill in the missing lines with functional code. This template is written in either Scala or Python.
You are not required to use the template and may solve the scenario using a language you prefer. Be aware, however, that coding every problem from scratch may take more time than is allocated for the exam.
As for Flume and availability of documentations you can see at the bottom of that page (I put Flume items in red):
CCA175 is a remote-proctored exam available anywhere, anytime. See the FAQ for more information and system requirements.
CCA175 is a hands-on, practical exam using Cloudera technologies. Each user is given their own CDH5 (currently 5.10.0) cluster pre-loaded with Spark 1.6, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also comes with Python (2.6, 2.7, and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans.
Cloudera Product Documentation
Apache Hadoop
Apache Hive
Apache Impala (Incubating)
Apache Sqoop
Spark
Apache Crunch
Apache Pig
Kite SDK
Apache Avro
Apache Parquet
Cloudera HUE
Apache Oozie
Apache Flume
DataFu
JDK 7 API Docs
Python 2.7 Documentation
Python 3.4 Documentation
Scala Documentation
Only the documentation, links, and resources listed above are accessible during the exam. All other websites, including Google/search functionality is disabled. You may not use notes or other exam aids.
Created 01-30-2018 12:54 PM
Thanks Mr. Jervis,
There have been a lots complaint about the cluster/VM performance, do you have any update from the certification department, I know you have been talking to them about those complaints.
Created 01-30-2018 12:59 PM
The certification team advised me that they have reviewed each of the exams they were contacted about and didn't find any performance issues.
Created 02-12-2018 07:22 PM
I am doing some practices, for some questions, there could be various solutions, for example, I can use RDD operations to do some filtering, sorting, and grouping; with DataFrame and SparkSQL, it is even easier to me to get the same result.
My question is will there be a requirement in the exam that some questions must be resolved using RDD, not DataFrame+SparkSQL. or vice versa?
Thank you.