I have recently downloaded HDP 2.4 Sandbox for Spark Certification, everything is as per Datasheet except pySpark which 2.6.6 and data sheet mentions 2.7.6, So do I need to upgrade pySpark to 2.7.6 in my HDP Sandbox
HDP 2.4 comes with spark 1.6. Pyspark version is also 1.6.
Datasheet is referring to the python version not the pyspark version.
Most probably the OS default python for the hdp 2.4 is 2.6.x and then you need to install the new version 2.7.6 manually. I recommend you install it in separate folder to avoid problems with any other services already using the 2.6.x version.
I usually install anaconda python which is easy and isolated to the OS one. Here are some links that show how to do this:
Then, in order to use this python version with pyspark you need to export some environment variables:
In next example anaconda python is install under /opt/anaconda2/bin/python
*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.
Thanks for answering. I am preparing for Spark certification. Exam lab will not have anaconda version of python., so I want to continue practice in same exam lab type environment