- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Do I need to install Python3 on every CDP node?
- Labels:
-
Apache Spark
Created 08-26-2021 02:58 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I will use Spark2 in CDP and need to install Python3. Do I need to installation Python3 on every node in the CDP cluster, just only need to install it on one particular node?
Spark2 job is executed in JVM containers that could be created on any worker node. I wonder whether the container is created upon a template? If yes, then how the template is created and where is it?
Thanks.
Created 08-30-2021 11:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Seaport
Yes it is required when if you want to run Python UDFs or do something outside spark SQL operations in your application.
If you are just using the Spark SQL API there’s no runtime requirement for python.
If you are going to install Spark3, please check below supported versions:
- Spark 2.4 supports Python 2.7 and 3.4-3.7.
- Spark 3.0 supports Python 2.7 and 3.4 and higher, although support for Python 2 and 3.4 to 3.5 is deprecated.
- Spark 3.1 supports Python 3.6 and higher.
CDS Powered by Apache Spark requires one of the following Python versions:
- Python 2.7 or higher, when using Python 2.
- Python 3.4 or higher, when using Python 3. (CDS 2.0 only supports Python 3.4 and 3.5; CDS 2.1 and higher include support for Python 3.6 and higher).
- Python 3.4 or higher, when using Python 3 (CDS 3).
Note: Spark 2.4 is not compatible with Python 3.8. The latest version recommended is Python 3.4+ (https://spark.apache.org/docs/2.4.0/#downloading). The Apache Jira SPARK-29536 related to Python 3.8 is fixed in Spark3.
Created 08-30-2021 11:24 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Seaport
Yes it is required when if you want to run Python UDFs or do something outside spark SQL operations in your application.
If you are just using the Spark SQL API there’s no runtime requirement for python.
If you are going to install Spark3, please check below supported versions:
- Spark 2.4 supports Python 2.7 and 3.4-3.7.
- Spark 3.0 supports Python 2.7 and 3.4 and higher, although support for Python 2 and 3.4 to 3.5 is deprecated.
- Spark 3.1 supports Python 3.6 and higher.
CDS Powered by Apache Spark requires one of the following Python versions:
- Python 2.7 or higher, when using Python 2.
- Python 3.4 or higher, when using Python 3. (CDS 2.0 only supports Python 3.4 and 3.5; CDS 2.1 and higher include support for Python 3.6 and higher).
- Python 3.4 or higher, when using Python 3 (CDS 3).
Note: Spark 2.4 is not compatible with Python 3.8. The latest version recommended is Python 3.4+ (https://spark.apache.org/docs/2.4.0/#downloading). The Apache Jira SPARK-29536 related to Python 3.8 is fixed in Spark3.
Created 09-03-2021 09:26 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Seaport Has the reply helped resolve your issue? If so, please mark the appropriate reply as the solution, as it will make it easier for others to find the answer in the future.
Regards,
Vidya Sargur,Community Manager
Was your question answered? Make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.
Learn more about the Cloudera Community:
Created 09-03-2021 05:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Vidya,
Thanks for your reply. Could you help me clarify the issue further? Does Spark (or other MapReduce tool) create the container using the local host as its template (to some degree)?
Created 09-14-2021 11:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Seaport
As you know, resource managers like yarn, standalone, kubernets will create containers. Internally RMs will use shell script to create containers. Based on resources, it will create one or more containers in the same node.
