Support Questions

Seaport · ‎08-26-2021

I will use Spark2 in CDP and need to install Python3. Do I need to installation Python3 on every node in the CDP cluster, just only need to install it on one particular node?

Spark2 job is executed in JVM containers that could be created on any worker node. I wonder whether the container is created upon a template? If yes, then how the template is created and where is it?

Thanks.

RangaReddy · ‎08-30-2021

Hi @Seaport

Yes it is required when if you want to run Python UDFs or do something outside spark SQL operations in your application.

If you are just using the Spark SQL API there’s no runtime requirement for python.

If you are going to install Spark3, please check below supported versions:

Spark 2.4 supports Python 2.7 and 3.4-3.7.
Spark 3.0 supports Python 2.7 and 3.4 and higher, although support for Python 2 and 3.4 to 3.5 is deprecated.
Spark 3.1 supports Python 3.6 and higher.

CDS Powered by Apache Spark requires one of the following Python versions:

Python 2.7 or higher, when using Python 2.
Python 3.4 or higher, when using Python 3. (CDS 2.0 only supports Python 3.4 and 3.5; CDS 2.1 and higher include support for Python 3.6 and higher).
Python 3.4 or higher, when using Python 3 (CDS 3).

Note: Spark 2.4 is not compatible with Python 3.8. The latest version recommended is Python 3.4+ (https://spark.apache.org/docs/2.4.0/#downloading). The Apache Jira SPARK-29536 related to Python 3.8 is fixed in Spark3.

View solution in original post

RangaReddy · ‎08-30-2021

Hi @Seaport

Yes it is required when if you want to run Python UDFs or do something outside spark SQL operations in your application.

If you are just using the Spark SQL API there’s no runtime requirement for python.

If you are going to install Spark3, please check below supported versions:

Spark 2.4 supports Python 2.7 and 3.4-3.7.
Spark 3.0 supports Python 2.7 and 3.4 and higher, although support for Python 2 and 3.4 to 3.5 is deprecated.
Spark 3.1 supports Python 3.6 and higher.

CDS Powered by Apache Spark requires one of the following Python versions:

Python 2.7 or higher, when using Python 2.
Python 3.4 or higher, when using Python 3. (CDS 2.0 only supports Python 3.4 and 3.5; CDS 2.1 and higher include support for Python 3.6 and higher).
Python 3.4 or higher, when using Python 3 (CDS 3).

Note: Spark 2.4 is not compatible with Python 3.8. The latest version recommended is Python 3.4+ (https://spark.apache.org/docs/2.4.0/#downloading). The Apache Jira SPARK-29536 related to Python 3.8 is fixed in Spark3.

VidyaSargur · ‎09-03-2021

@Seaport Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.

Regards,

Vidya Sargur,
Community Manager

Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Community Guidelines
How to use the forum

Seaport · ‎09-03-2021

Vidya,

Thanks for your reply. Could you help me clarify the issue further? Does Spark (or other MapReduce tool) create the container using the local host as its template (to some degree)?

RangaReddy · ‎09-14-2021

Hi @Seaport

As you know, resource managers like yarn, standalone, kubernets will create containers. Internally RMs will use shell script to create containers. Based on resources, it will create one or more containers in the same node.

Support Questions

Do I need to install Python3 on every CDP node?