Support Questions

techsoln · ‎04-29-2019

Hi All,

We are currently using Spark 1.6 on CDH 5.10 platform. We are currently upgrading from python 2.7 to python 3.6 using anaconda distribution. While i try to do spark-submit in client mode the process is failing giving below error -

File "/apps/cloudera/parcels/CDH-5.10.1-1.cdh5.10.1.p0.10/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 381, in namedtuple
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'

We are not very clear about the cause of the failure. We have checked Spark documentation and it says that Spark 1.6.0 is compatible with python 3.0+.

Any thoughts or suggestions on this would be helpful ?

Thanks

Hemil

Jerry · ‎04-29-2019

Hi

Based on the error message you have shared.
...
TypeError: namedtuple() missing 3 required keyword-only arguments: 'verbose', 'rename', and 'module'

This error corresponds to bug JIRA SPARK-19019 [1]. This bug relates to a compatibility issue between Spark and Python 3.6

Spark 1.6 requires Python 2.6+ as per the Document[2]

[1] https://issues.apache.org/jira/browse/SPARK-19019
[2] https://spark.apache.org/docs/1.6.0/#downloading
[3] https://www.cloudera.com/documentation/enterprise/5-14-x/topics/spark_python.html#spark_python__sect...

techsoln · ‎04-29-2019

Thanks, Agreed. I also found the bug details.

Based on the URL https://spark.apache.org/docs/1.6.0/#downloading you shared, it contains details which says it is compatible with 2.6+ and 3.1+ which is totally misleading since 3.6 is 3.1+

I have started working to upgrade my app to spark 2. Any suggestiosn on Spark 1.6 to Spark 2 migration guide on Cloudera cluster

Cloudera Community

Support Questions

pyspark fails with python 3.6