Created 10-31-2016 06:12 PM
1. Which version of HDP sandbox is being used ?
2. Which version of Spark being used ?
3. What kind of IDE options are available during the exam for Python? Apart from pyspark-shell, is there any IDE available like IPython or Zeppelin ? Is there any IDE option available which have auto suggestion option and where we can submit jobs to cluster. Please advise.
4. I have read few posts in hortonworks community, that we may use Spark RDDs or Spark Dataframes for accomplishing the tasks? Please confirm.
5. What is the pass percentage on average ?
Created 10-31-2016 06:33 PM
the test environment is on AMS virtual. When I took the test, it was HDP2.3 and I am not sure what version is used now. you could use the current sandbox for your exercise. Spark is something later than 1.4, probably 1.5. But the knowledge covered are all basic RDD and dataframe that are not very much linked to newer versions. test environment has no IDE. You use either gedit or vi base on you preference. debug with spark-shell or pyspark
couple notes on the exam
1. know RDD and dataframe api well. Go through all the docs in the test web page.
2. know how to import and export RDD/dataframe from/to csv files.
3. there is no limit on how you finish the task, so choose the technical you are most familiar with either the API or Spark SQL
4. test environment is quite slow in response, so be patient with it and leave enough time for tasks.
Good luck taking the exam.
Created 10-31-2016 06:23 PM
@Qi Wang : Could you please help with the above queries.
Created 10-31-2016 06:33 PM
the test environment is on AMS virtual. When I took the test, it was HDP2.3 and I am not sure what version is used now. you could use the current sandbox for your exercise. Spark is something later than 1.4, probably 1.5. But the knowledge covered are all basic RDD and dataframe that are not very much linked to newer versions. test environment has no IDE. You use either gedit or vi base on you preference. debug with spark-shell or pyspark
couple notes on the exam
1. know RDD and dataframe api well. Go through all the docs in the test web page.
2. know how to import and export RDD/dataframe from/to csv files.
3. there is no limit on how you finish the task, so choose the technical you are most familiar with either the API or Spark SQL
4. test environment is quite slow in response, so be patient with it and leave enough time for tasks.
Good luck taking the exam.
Created 10-31-2016 07:32 PM
Thank you for your prompt response. Could you please help with below queries.
Current sandbox version is HDP 2.5 and supported Spark version is 1.6.2.
1. In the sandbox which I have downloaded, only vi is available, there is no gedit. Do we need to install gedit ?
2. I have learn that Apache Spark documentation and Hortonworks Spark documentation is available during exam.
Apache Spark Documentation: https://spark.apache.org/docs . Is this the right link ?
Hortonworks Spark documentation: What is hortonworks spark documentation link ?
Thanks in advance.
Created 11-01-2016 01:42 AM
the exam is setup on ubuntu with centOS VM as HDP. gedit is on ubuntu. I guess you could install gedit on your own environment but it is very easy to use, so no worry. If you really want to try the test environment, try use HDPCD practice exam, very similar.
http://hortonworks.com/wp-content/uploads/2015/02/HDPCD-PracticeExamGuide1.pdf
The document will be accessible during exam. it is the link you used for apache site http://spark.apache.org/docs For Hortonworks document, it is under http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/index.html Stick with Apache document as the exam is not really anything Hortonworks specific.
There is no way to change the exam environment. You have very limited permissions.
Created 11-01-2016 02:28 PM
@Qi Wang Thanks a lot..that really helps.
Created 10-31-2016 07:46 PM
3. Is there anyway, to activate intellisense/auto completion work in HDP environment for spark in python. Either using vi/gedit or by using pyspark shell.
Thanks in advance.
Created 11-03-2016 04:52 PM
In Spark 1.6.* version, RDD/dataframe have functions to write only to below formats
rdd.saveAsTextFile / saveAsSequenceFile
df.write.orc / json / parquet / text / saveAsTable
Query: I am sure we can not download other csv packages (i.e. databricks..etc) during the test. Is there any way to write the output file in csv format. Please advise.
Thanks in advance.
Created 11-03-2016 04:58 PM
Yes and check my answer on another thread
Created 11-06-2016 03:49 AM
Is it safe to assume that the Databricks package will be available during the test to read and write to csv files ?