Support Questions
Find answers, ask questions, and share your expertise

Spark certification: HDP Sandbox Pyspark version is 2.6.6 but certification Data sheet mentions 2.7.6

Spark certification: HDP Sandbox Pyspark version is 2.6.6 but certification Data sheet mentions 2.7.6

New Contributor

Hello,

I have recently downloaded HDP 2.4 Sandbox for Spark Certification, everything is as per Datasheet except pySpark which 2.6.6 and data sheet mentions 2.7.6, So do I need to upgrade pySpark to 2.7.6 in my HDP Sandbox

2 REPLIES 2

Re: Spark certification: HDP Sandbox Pyspark version is 2.6.6 but certification Data sheet mentions 2.7.6

@Amardas Gundu

HDP 2.4 comes with spark 1.6. Pyspark version is also 1.6.

Datasheet is referring to the python version not the pyspark version.

Most probably the OS default python for the hdp 2.4 is 2.6.x and then you need to install the new version 2.7.6 manually. I recommend you install it in separate folder to avoid problems with any other services already using the 2.6.x version.

I usually install anaconda python which is easy and isolated to the OS one. Here are some links that show how to do this:

https://community.hortonworks.com/articles/194089/how-to-install-conda-anaconda-or-miniconda.html

Then, in order to use this python version with pyspark you need to export some environment variables:

In next example anaconda python is install under /opt/anaconda2/bin/python

export PYSPARK_DRIVER_PYTHON=/opt/anaconda2/bin/python

export PYSPARK_PYTHON=/opt/anaconda2/bin/python

HTH

*** If you found this answer addressed your question, please take a moment to login and click the "accept" link on the answer.

Re: Spark certification: HDP Sandbox Pyspark version is 2.6.6 but certification Data sheet mentions 2.7.6

New Contributor

Thanks for answering. I am preparing for Spark certification. Exam lab will not have anaconda version of python., so I want to continue practice in same exam lab type environment