Created 12-06-2016 07:13 PM
I have some additional questions about the Spark exam not been answered by other questions here.
I do have project experience with Spark but feel quite uncomfortable not knowing what to expect in the exam.
Created 12-20-2016 07:38 AM
I had my exam three days ago. Let me answer my own question.
Further information:
Created 12-07-2016 06:12 PM
I had almost the same question in mind. Adding to that, will I have access to Spark documentation or I have to write code from my own memory?
Created 12-07-2016 06:43 PM
Let me answer what I can...
First, when I took it, it was 2.4., but I doubt it has changed that much if any from 4 months ago.
1. Essentially the interface is the same that you get when doing the Hive practice exam. You get a Linux host for you to code with. I recommend coding your jobs in the default text editor, then submitting them through the command line.
2. Look at the Spark documentation for the command line switches to tell you how to submit the job on Yarn.
3. For documentation there will be a link to the Spark documentation and the Python documentation, so you get that. There's probably a link for Scala as well, but I use Python. I'm told it's the same as the Hive practice where you get the documentation. You need to know it though, because you don't have a lot of time to do a lot of reading. I recommend you know your way around the documentation pretty well.
4. For #4 and #5, No. You are stuck with submitting through the command line. I thought this was odd because the HDP Spark training was all about Zeppelin. But I think it has to do with how they grade it.
5. I'm not sure why they haven't posted more about the exam. They have the practice for the Hive, which gets you used to the environment, but not for Spark. I think they are probably trying to work on that, but it takes time. I haven't tried it, but it would probably be worthwhile looking at that.
6. I think it would be impractical writing these jobs from the shell. First, I'm not sure how they would grade it and second you need to be able to start over and rerun everything. You will be time constrained.
7. Since you may get HDP 2.4, I'd be prepared to write a csv file without 2.0 if it is an exam topic. It takes a while to change these exams I think.
I don't think I've exposed any secrets here that they wouldn't want you to know going in or that they didn't expose on the Hive certification through the practice.
Created 12-07-2016 07:34 PM
Thank you very much @Don Jernigan. Your answer helps me a lot. However I have further questions.
Created 12-07-2016 07:52 PM
@rich You have answered other questions regarding the Spark exam. We would be very grateful if you could answer some questions here.
Created 12-20-2016 07:38 AM
I had my exam three days ago. Let me answer my own question.
Further information:
Created 05-25-2017 02:37 PM
@Stefan Frankenhauser: For #3, are we supposed to just write the scala code that will run in spark shell or do we need to create object/class and define SparkContext, SQLContext etc.
Created 05-26-2017 05:37 AM
You write your code in the Spark shell. SparkContext and SqlContext are already available.
Created 07-08-2017 01:35 AM
can weuse only SparkSQL API to solve all tasks?
Created 07-10-2017 08:55 AM
No, I don`t think so. You also need some RDD knowledge, for example to read an CSV file and transform it to a DataFrame.