Support Questions

Find answers, ask questions, and share your expertise

HDPCD Spark Exam

avatar
Contributor

I have some additional questions about the Spark exam not been answered by other questions here.

  1. The current sandbox is HDP 2.5. Is this the version used in the exam?
  2. HDP 2.5 comes with Spark 1.6 and 2.0. Can I choose which version I would like to use to solve the tasks? (2.0 supports writing and reading of csv files out of the box.)
  3. Do I only have to use the Spark shell? If yes, why is there the exam objective "Initialize a Spark application"? Since using the Spark shell I do not have to do that manually. Further more there is "Run a Spark job on YARN". How should this be tested?
  4. Do I have something like Ambari to look at Hive tables or the files in the HDFS?
  5. Is there Zeppelin and can I use it?
  6. Can I change the keyboard layout?

I do have project experience with Spark but feel quite uncomfortable not knowing what to expect in the exam.

1 ACCEPTED SOLUTION

avatar
Contributor

I had my exam three days ago. Let me answer my own question.

  1. I do not know which HDP version it was.
  2. The default version `running `spark-shell` in the terminal was Spark 1.6. I did not try to change it.
  3. Yes, I was solving the tasks with Scala in the Spark Shell. However you have to save all you commands in the provided text files. It was not necessary to build a JAR manually to submit it. But there could be a task to submit a provided JAR to YARN.
  4. I do not know. You can use `hadoop fs` commands in the terminal to browse the HDFS.
  5. I do not think so.
  6. You do not have to. Since the VM is running in your browser it automatically uses your local one.

Further information:

  • I think there were some links on the desktop to the documentation. But I did no use it.
  • You do not have to write CSV files with a header. Read carefully, the delimiter do not have to be the same in all tasks.
  • The general question pattern is: read this from here, do something with it, write the results to here.
  • Because only the output counts you have to read the tasks carefully (ordering of columns, sorting of the data, delimiter in CSV files, ...)
  • It is up to you how to solve the tasks. You can use RDD or SparkSQL API.
  • The exam is not really difficult if you work through the exam objectives.

View solution in original post

11 REPLIES 11

avatar
New Contributor

I have appeared today for my HDP-spark exam which lasted for 2 hours having 7 questions in total.

Points experienced today :

1.Please go through all the 7 questions first, Prioritize the questions based on your comfort level.

2.Expect the typical Cent-OS cluster behavior such as Ctrl+U , Ctrl+L won’t work in HDP-cluster.

3.No select pattern copy paste as we are able to do in our normal unix box.

4.Cluster is very slow, Please go with recommended bandwidth > 20 MBPS.

5.PSI support may take long time in solving environment issues like exam not getting delivered, DNS configuration at your browser.Please be ready for that.

6.As it’s a time constraint exam, Please don’t waste time on any ques more than 15 mins..

7.Concentrate more on string parsing , input parsing .Remember guys there is no tie to think there.We won’t be able to complete 5 ques then.

Personal experience :

People said at many places we need to solve 4/7 ,its actually 5/7 we need to solve. Try to have the exam on desktop machine rather taking on laptop with descent monitor screen.

Don’t compromise on bandwidth.

So guys overall,Best Of luck practice hard before an attempt.

Thanks,

Girish Pillai

avatar

I found the materials at this site to be the best to help me study: https://shoptly.com/sparkstudyguide