Support Questions

Find answers, ask questions, and share your expertise

pyspark fails with python 3.6


Hi All,


We are currently using Spark 1.6 on CDH 5.10 platform. We are currently upgrading from python 2.7 to python 3.6 using anaconda distribution. While i try to do spark-submit in client mode the process is failing giving below error -


File "/apps/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/spark/python/lib/", line 381, in namedtuple
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'


We are not very clear about the cause of the failure. We have checked Spark documentation and it says that Spark 1.6.0 is compatible with python 3.0+. 


Any thoughts or suggestions on this would be helpful ?





Rising Star

Based on the error message you have shared.
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'

This error corresponds to bug JIRA SPARK-19019 [1]. This bug relates to a compatibility issue between Spark and Python 3.6

Spark 1.6 requires Python 2.6+ as per the Document[2]



Thanks, Agreed. I also found the bug details.


Based on the URL you shared, it contains details which says it is compatible with 2.6+ and 3.1+ which is totally misleading since 3.6 is 3.1+


I have started working to upgrade my app to spark 2. Any suggestiosn on Spark 1.6 to Spark 2 migration guide on Cloudera cluster