Reply
Explorer
Posts: 16
Registered: ‎12-30-2013

CCA175 tools on the cluster

Dear Cloudera Certification Team

 

Need some information on the exam setup CCA 175

 

1. Would the cluster be in started state? if Not would there be cloudera manager to start it? Would there be any assistance for any operational issues we might face during exam such as one node going done etc?

 

2. What editors will be available for writing spark and python code

 

3 Would exam need both python and Spark or either of them would be enough

 

4. How much would the data size such that would there be lot of time consumed in processing on cluster

 

5. If we create a maven project in eclipse - would it be able to connect to internet to build dependencies or there is offline repository

 

6. Are we allowed to write down something on paper during exam? to model our thought process

 

7. Are we allowed to drink tea/coffee/water during the exam

 

8. What should be done if we loose internet connectivity during exam?

 

9. How are the marks given i.e. are there marks for steps? Would I get marks if just final step has some issue?

 

10 are there any optional questions?

 

11 Are the questions depedent on others i.e. first question may cover only upto ingestion part and second might require the process the igested data. If I know how to process the data but get into issue in the ingestion step I will not be able to do the second question as well?

 

 

Thanks

 

 

 

 

 

Cloudera Employee
Posts: 75
Registered: ‎12-21-2015

Re: CCA175 tools on the cluster

Most of the information you are looking for is on Cloudera's website.

http://www.cloudera.com/content/www/en-us/training/certification/cca-spark.html

 

1.  Each user is given their own CDH5 (currently 5.3.2) cluster.  Everything will be running.  There will be a proctor in case of technical issues.

 

2.  In addition the cluster also comes with Python (2.6 and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans.

 

 

3.  This includes writing Spark applications in both Scala and Python

 

4.  You have a cluster of four worker nodes.  Data sizes will be in the millions of records.  Processing time should be on the order of a minute or two if done correctly.

 

5.  There is no internet connectivity other than the provided websites.

 

6.  You cannot use paper notes during the exam.  You can take electronic notes if you need them.

 

7.  No eating or drinking during the exam.

 

8.  It is the sole responsibility of the test taker to maintain connectivity throughout the exam session. If connectivity is lost, for any reason, it is the responsibility of the test taker to reconnect and finish the exam within the scheduled time slot. No refunds or retakes will be given. Unfinished or abandoned exam sessions will be scored as a fail.

 

9.  It most cases, the CCA exam tests a single learning objective, and there is little need for partial credit.  If the question asked you to use sqoop, either the data is present on the cluster or it is not.

 

10.  There are no optional questions.

 

11.  There are no dependencies between questions.

Explorer
Posts: 16
Registered: ‎12-30-2013

Re: CCA175 tools on the cluster

Hi

 

Thanks for reply

 

This includes writing Spark applications in both Scala and Python"

 

I prefer using sublime and maven to build or rather even if eclipse (not sure if scala plugin is available) Maven is used to build to code

 

which maven repositories should be specified - i.e. would bellow work ?

 

<repository>
<id>scala-tools.org</id>
<name>Scala-tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</repository>
<repository>
<id>maven-hadoop</id>
<name>Hadoop Releases</name>
<url>https://repository.cloudera.com/content/repositories/releases/</url>
</repository>
<repository>
<id>cloudera-repos</id>
<name>Cloudera Repos</name>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>

<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>

Cloudera Employee
Posts: 75
Registered: ‎12-21-2015

Re: CCA175 tools on the cluster

Sublime is installed on the cluster.  Maven is installed on the cluster.  The scala plugin for Eclipse is not installed.

 

Your solution would not work.  We block Internet access to all sites except the ones we have listed, so you cannot download additional repositories.

 

But everything that you need to build should already be there.

Highlighted
New Contributor
Posts: 7
Registered: ‎02-12-2018

Re: CCA175 tools on the cluster

Hi @MRhadoop

I've also been dealing with these questions.

I've been across the entire forum and have gathered the following:

 

- YOU ARE NOT EXPECTED TO COMPILE YOUR CODE IN ECLIPSE OR SANDBOX

- PREPARE TO USE SPARK-SHELL OR PYSPARK.

- YOU WOULD NOT REQUIRE MAVEN REPOSITORIES AS THEY WOULD BE BLOCKED

-PREPARE TO USE SPARK-SHELL OR PYSPARK for all Spark scenarios.

 

I guess it's done this way because for the amount of time allocated for this test, anything can go wrong while downloading repositories etc. They just might be trying to reduce the external variables as much as possible.

Hope this helps.

And good luck.

Announcements