Support Questions

Find answers, ask questions, and share your expertise

HDPCD Spark Certification

avatar
Contributor

1. Which version of HDP sandbox is being used ?

2. Which version of Spark being used ?

3. What kind of IDE options are available during the exam for Python? Apart from pyspark-shell, is there any IDE available like IPython or Zeppelin ? Is there any IDE option available which have auto suggestion option and where we can submit jobs to cluster. Please advise.

4. I have read few posts in hortonworks community, that we may use Spark RDDs or Spark Dataframes for accomplishing the tasks? Please confirm.

5. What is the pass percentage on average ?

1 ACCEPTED SOLUTION

avatar
Master Collaborator

the test environment is on AMS virtual. When I took the test, it was HDP2.3 and I am not sure what version is used now. you could use the current sandbox for your exercise. Spark is something later than 1.4, probably 1.5. But the knowledge covered are all basic RDD and dataframe that are not very much linked to newer versions. test environment has no IDE. You use either gedit or vi base on you preference. debug with spark-shell or pyspark

couple notes on the exam

1. know RDD and dataframe api well. Go through all the docs in the test web page.

2. know how to import and export RDD/dataframe from/to csv files.

3. there is no limit on how you finish the task, so choose the technical you are most familiar with either the API or Spark SQL

4. test environment is quite slow in response, so be patient with it and leave enough time for tasks.

Good luck taking the exam.

View solution in original post

14 REPLIES 14

avatar
Contributor

@Qi Wang : Could you please help with the above queries.

avatar
Master Collaborator

the test environment is on AMS virtual. When I took the test, it was HDP2.3 and I am not sure what version is used now. you could use the current sandbox for your exercise. Spark is something later than 1.4, probably 1.5. But the knowledge covered are all basic RDD and dataframe that are not very much linked to newer versions. test environment has no IDE. You use either gedit or vi base on you preference. debug with spark-shell or pyspark

couple notes on the exam

1. know RDD and dataframe api well. Go through all the docs in the test web page.

2. know how to import and export RDD/dataframe from/to csv files.

3. there is no limit on how you finish the task, so choose the technical you are most familiar with either the API or Spark SQL

4. test environment is quite slow in response, so be patient with it and leave enough time for tasks.

Good luck taking the exam.

avatar
Contributor

@Qi Wang

Thank you for your prompt response. Could you please help with below queries.

Current sandbox version is HDP 2.5 and supported Spark version is 1.6.2.

1. In the sandbox which I have downloaded, only vi is available, there is no gedit. Do we need to install gedit ?

2. I have learn that Apache Spark documentation and Hortonworks Spark documentation is available during exam.

Apache Spark Documentation: https://spark.apache.org/docs . Is this the right link ?

Hortonworks Spark documentation: What is hortonworks spark documentation link ?

Thanks in advance.

avatar
Master Collaborator

the exam is setup on ubuntu with centOS VM as HDP. gedit is on ubuntu. I guess you could install gedit on your own environment but it is very easy to use, so no worry. If you really want to try the test environment, try use HDPCD practice exam, very similar.

http://hortonworks.com/wp-content/uploads/2015/02/HDPCD-PracticeExamGuide1.pdf

The document will be accessible during exam. it is the link you used for apache site http://spark.apache.org/docs For Hortonworks document, it is under http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/index.html Stick with Apache document as the exam is not really anything Hortonworks specific.

There is no way to change the exam environment. You have very limited permissions.

avatar
Contributor

@Qi Wang Thanks a lot..that really helps.

avatar
Contributor

@Qi Wang

3. Is there anyway, to activate intellisense/auto completion work in HDP environment for spark in python. Either using vi/gedit or by using pyspark shell.

Thanks in advance.

avatar
Contributor

@Qi Wang

In Spark 1.6.* version, RDD/dataframe have functions to write only to below formats

rdd.saveAsTextFile / saveAsSequenceFile

df.write.orc / json / parquet / text / saveAsTable

Query: I am sure we can not download other csv packages (i.e. databricks..etc) during the test. Is there any way to write the output file in csv format. Please advise.

Thanks in advance.

avatar
Master Collaborator

avatar
Contributor

@Qi Wang

Is it safe to assume that the Databricks package will be available during the test to read and write to csv files ?

  1. pyspark --packages com.databricks:spark-csv_2.10:1.4.0
  2. df.write.format("com.databricks.spark.csv").option("header","true").save("file.csv")