Member since
11-25-2016
10
Posts
4
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5646 | 12-20-2016 07:38 AM |
07-10-2017
08:55 AM
No, I don`t think so. You also need some RDD knowledge, for example to read an CSV file and transform it to a DataFrame.
... View more
05-26-2017
05:37 AM
You write your code in the Spark shell. SparkContext and SqlContext are already available.
... View more
12-20-2016
07:38 AM
3 Kudos
I had my exam three days ago. Let me answer my own question. I do not know which HDP version it was. The default version `running `spark-shell` in the terminal was Spark 1.6. I did not try to change it. Yes, I was solving the tasks with Scala in the Spark Shell. However you have to save all you commands in the provided text files. It was not necessary to build a JAR manually to submit it. But there could be a task to submit a provided JAR to YARN. I do not know. You can use `hadoop fs` commands in the terminal to browse the HDFS. I do not think so. You do not have to. Since the VM is running in your browser it automatically uses your local one. Further information: I think there were some links on the desktop to the documentation. But I did no use it. You do not have to write CSV files with a header. Read carefully, the delimiter do not have to be the same in all tasks. The general question pattern is: read this from here, do something with it, write the results to here. Because only the output counts you have to read the tasks carefully (ordering of columns, sorting of the data, delimiter in CSV files, ...) It is up to you how to solve the tasks. You can use RDD or SparkSQL API. The exam is not really difficult if you work through the exam objectives.
... View more
12-20-2016
07:13 AM
I had my exam three days ago. Since the VM is running in your Browser you do not have to change any keyboard layout. The VM automatically uses your local one.
... View more
12-13-2016
04:02 PM
Thank you @William Gonzalez. I assume this works for every exam, even the HDPCD Spark exam, right?
... View more
12-12-2016
07:04 PM
I assume the keyboard layout of the HDPCD exam environment is US. Since I am used to work with another keyboard layout (DE) I would like to change it in the exam. Is it possible to change it for example in the terminal with "loadkeys de"?
... View more
12-07-2016
07:52 PM
@rich You have answered other questions regarding the Spark exam. We would be very grateful if you could answer some questions here.
... View more
12-07-2016
07:34 PM
Thank you very much @Don Jernigan. Your answer helps me a lot. However I have further questions. Using Python it is simple to submit a job to Yarn, because you do not need more than a .py file. But when I want to use Scala it is necessary to build a .jar file with Maven, sbt or something like that. I am not sure if we have these build tools available in the exam. Did someone use Scale in the exam? Do I have to write csv files with an header line describing the column names? If yes, I think it is no that easy in a distributed environment. Is the general question pattern "Read this file(s), do something with it and write the result to here"? At the end only the results will be checked.
... View more
12-06-2016
07:13 PM
1 Kudo
I have some additional questions about the Spark exam not been answered by other questions here.
The current sandbox is HDP 2.5. Is this the version used in the exam? HDP 2.5 comes with Spark 1.6 and 2.0. Can I choose which version I would like to use to solve the tasks? (2.0 supports writing and reading of csv files out of the box.) Do I only have to use the Spark shell? If yes, why is there the exam objective "Initialize a Spark application"? Since using the Spark shell I do not have to do that manually. Further more there is "Run a Spark job on YARN". How should this be tested? Do I have something like Ambari to look at Hive tables or the files in the HDFS? Is there Zeppelin and can I use it? Can I change the keyboard layout? I do have project experience with Spark but feel quite uncomfortable not knowing what to expect in the exam.
... View more
Labels:
- Labels:
-
Apache Spark
12-06-2016
06:42 PM
@Qi Wang I think we do not have the Databrick CSV library available in the exam. Your approach with mkString() works well if there is no header required in the output csv file. Can I assume that in the exam tasks?
... View more