Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

HDPCD Spark Certification

Solved Go to solution

HDPCD Spark Certification

1. Which version of HDP sandbox is being used ?

2. Which version of Spark being used ?

3. What kind of IDE options are available during the exam for Python? Apart from pyspark-shell, is there any IDE available like IPython or Zeppelin ? Is there any IDE option available which have auto suggestion option and where we can submit jobs to cluster. Please advise.

4. I have read few posts in hortonworks community, that we may use Spark RDDs or Spark Dataframes for accomplishing the tasks? Please confirm.

5. What is the pass percentage on average ?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: HDPCD Spark Certification

Expert Contributor

the test environment is on AMS virtual. When I took the test, it was HDP2.3 and I am not sure what version is used now. you could use the current sandbox for your exercise. Spark is something later than 1.4, probably 1.5. But the knowledge covered are all basic RDD and dataframe that are not very much linked to newer versions. test environment has no IDE. You use either gedit or vi base on you preference. debug with spark-shell or pyspark

couple notes on the exam

1. know RDD and dataframe api well. Go through all the docs in the test web page.

2. know how to import and export RDD/dataframe from/to csv files.

3. there is no limit on how you finish the task, so choose the technical you are most familiar with either the API or Spark SQL

4. test environment is quite slow in response, so be patient with it and leave enough time for tasks.

Good luck taking the exam.

View solution in original post

14 REPLIES 14
Highlighted

Re: HDPCD Spark Certification

@Qi Wang : Could you please help with the above queries.

Highlighted

Re: HDPCD Spark Certification

Expert Contributor

the test environment is on AMS virtual. When I took the test, it was HDP2.3 and I am not sure what version is used now. you could use the current sandbox for your exercise. Spark is something later than 1.4, probably 1.5. But the knowledge covered are all basic RDD and dataframe that are not very much linked to newer versions. test environment has no IDE. You use either gedit or vi base on you preference. debug with spark-shell or pyspark

couple notes on the exam

1. know RDD and dataframe api well. Go through all the docs in the test web page.

2. know how to import and export RDD/dataframe from/to csv files.

3. there is no limit on how you finish the task, so choose the technical you are most familiar with either the API or Spark SQL

4. test environment is quite slow in response, so be patient with it and leave enough time for tasks.

Good luck taking the exam.

View solution in original post

Highlighted

Re: HDPCD Spark Certification

@Qi Wang

Thank you for your prompt response. Could you please help with below queries.

Current sandbox version is HDP 2.5 and supported Spark version is 1.6.2.

1. In the sandbox which I have downloaded, only vi is available, there is no gedit. Do we need to install gedit ?

2. I have learn that Apache Spark documentation and Hortonworks Spark documentation is available during exam.

Apache Spark Documentation: https://spark.apache.org/docs . Is this the right link ?

Hortonworks Spark documentation: What is hortonworks spark documentation link ?

Thanks in advance.

Highlighted

Re: HDPCD Spark Certification

Expert Contributor

the exam is setup on ubuntu with centOS VM as HDP. gedit is on ubuntu. I guess you could install gedit on your own environment but it is very easy to use, so no worry. If you really want to try the test environment, try use HDPCD practice exam, very similar.

http://hortonworks.com/wp-content/uploads/2015/02/HDPCD-PracticeExamGuide1.pdf

The document will be accessible during exam. it is the link you used for apache site http://spark.apache.org/docs For Hortonworks document, it is under http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/index.html Stick with Apache document as the exam is not really anything Hortonworks specific.

There is no way to change the exam environment. You have very limited permissions.

Highlighted

Re: HDPCD Spark Certification

@Qi Wang Thanks a lot..that really helps.

Highlighted

Re: HDPCD Spark Certification

@Qi Wang

3. Is there anyway, to activate intellisense/auto completion work in HDP environment for spark in python. Either using vi/gedit or by using pyspark shell.

Thanks in advance.

Highlighted

Re: HDPCD Spark Certification

@Qi Wang

In Spark 1.6.* version, RDD/dataframe have functions to write only to below formats

rdd.saveAsTextFile / saveAsSequenceFile

df.write.orc / json / parquet / text / saveAsTable

Query: I am sure we can not download other csv packages (i.e. databricks..etc) during the test. Is there any way to write the output file in csv format. Please advise.

Thanks in advance.

Highlighted

Re: HDPCD Spark Certification

Expert Contributor
Highlighted

Re: HDPCD Spark Certification

@Qi Wang

Is it safe to assume that the Databricks package will be available during the test to read and write to csv files ?

  1. pyspark --packages com.databricks:spark-csv_2.10:1.4.0
  2. df.write.format("com.databricks.spark.csv").option("header","true").save("file.csv")
Don't have an account?
Coming from Hortonworks? Activate your account here