Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Master Collaborator

Spark Python Integration Test Result Exceptions

 

In this article, just I talk about exceptions and their Python and Spark versions. Keep on watching this article where I will add some more exceptions and solutions.

1. TypeError: an integer is required (got type bytes)

Exception:

Traceback (most recent call last):
  File "/opt/pyspark_udf_example.py", line 3, in <module>
    from pyspark.sql import SparkSession
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
  File "<frozen zipimport>", line 259, in load_module
  File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 46, in <module>
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
  File "<frozen zipimport>", line 259, in load_module
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 31, in <module>
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
  File "<frozen zipimport>", line 259, in load_module
  File "/opt/spark/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module>
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
  File "<frozen zipimport>", line 259, in load_module
  File "/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 146, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 127, in _make_cell_set_template_code
TypeError: an integer is required (got type bytes)

Python Versions:

  • Python 3.8
  • Python 3.9

Spark Versions:

  • Spark Version: 2.3.0
  • Spark Version: 2.3.1
  • Spark Version: 2.3.2
  • Spark Version: 2.3.3
  • Spark Version: 2.3.4
  • Spark Version: 2.4.0
  • Spark Version: 2.4.1
  • Spark Version: 2.4.2
  • Spark Version: 2.4.3
  • Spark Version: 2.4.4
  • Spark Version: 2.4.5
  • Spark Version: 2.4.6
  • Spark Version: 2.4.7
  • Spark Version: 2.4.8

2. TypeError: 'bytes' object cannot be interpreted as an integer

Exception:

Traceback (most recent call last):
  File "/opt/pyspark_udf_example.py", line 3, in <module>
    from pyspark.sql import SparkSession
  File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 46, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 31, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 146, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 127, in _make_cell_set_template_code
TypeError: 'bytes' object cannot be interpreted as an integer

Python Versions:

  • Python 3.10

Spark Versions:

  • Spark Version: 2.3.0
  • Spark Version: 2.3.1
  • Spark Version: 2.3.2
  • Spark Version: 2.3.3
  • Spark Version: 2.3.4
  • Spark Version: 2.4.0
  • Spark Version: 2.4.1
  • Spark Version: 2.4.2
  • Spark Version: 2.4.3
  • Spark Version: 2.4.4
  • Spark Version: 2.4.5
  • Spark Version: 2.4.6
  • Spark Version: 2.4.7
  • Spark Version: 2.4.8

3. TypeError: code expected at least 16 arguments, got 15

Exception:

Traceback (most recent call last):
  File "/opt/pyspark_udf_example.py", line 3, in <module>
    from pyspark.sql import SparkSession
  File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 46, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 31, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 146, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 127, in _make_cell_set_template_code
TypeError: code expected at least 16 arguments, got 15

Python Versions:

  • Python 3.11

Spark Versions:

  • Spark Version: 2.3.0
  • Spark Version: 2.3.1
  • Spark Version: 2.3.2
  • Spark Version: 2.3.3
  • Spark Version: 2.3.4
  • Spark Version: 2.4.0
  • Spark Version: 2.4.1
  • Spark Version: 2.4.2
  • Spark Version: 2.4.3
  • Spark Version: 2.4.4
  • Spark Version: 2.4.5
  • Spark Version: 2.4.6
  • Spark Version: 2.4.7
  • Spark Version: 2.4.8

4. TypeError: code() argument 13 must be str, not int

Exception:

Traceback (most recent call last):
  File "/opt/pyspark_udf_example.py", line 3, in <module>
    from pyspark.sql import SparkSession
  File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/context.py", line 30, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 71, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 209, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 172, in _make_cell_set_template_code
TypeError: code() argument 13 must be str, not int

Python Versions:

  • Python 3.11

Spark Versions:

  • Spark Version: 3.0.0
  • Spark Version: 3.0.1
  • Spark Version: 3.0.3
  • Spark Version: 3.0.3

5. SyntaxError: invalid syntax

Exception:

Traceback (most recent call last):
  File "/opt/pyspark_udf_example.py", line 3, in <module>
    from pyspark.sql import SparkSession
  File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 53, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 34, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/java_gateway.py", line 31, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/find_spark_home.py", line 68
    print("Could not find valid SPARK_HOME while searching {0}".format(paths), file=sys.stderr)
                                                                                   ^
SyntaxError: invalid syntax

Python Versions:

  • Python 2.7

Spark Versions:

  • Spark Version: 3.1.1
  • Spark Version: 3.1.2
  • Spark Version: 3.1.3
  • Spark Version: 3.2.0
  • Spark Version: 3.2.1
  • Spark Version: 3.2.2
  • Spark Version: 3.2.3

6. ImportError: No module named 'typing'

Exception:

Traceback (most recent call last):
  File "/opt/pyspark_udf_example.py", line 3, in <module>
    from pyspark.sql import SparkSession
  File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 53, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 34, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/java_gateway.py", line 32, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 67, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle/__init__.py", line 4, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle.py", line 54, in <module>
ImportError: No module named 'typing'

Python Versions:

  • Python 3.4

Spark Versions:

  • Spark Version: 3.1.1
  • Spark Version: 3.1.2
  • Spark Version: 3.1.3
  • Spark Version: 3.2.0
  • Spark Version: 3.2.1
  • Spark Version: 3.2.2
  • Spark Version: 3.2.3
  • Spark Version: 3.3.0
  • Spark Version: 3.3.1

7. AttributeError: 'NoneType' object has no attribute 'items'

Exception:

Traceback (most recent call last):
  File "/opt/pyspark_udf_example.py", line 3, in <module>
    from pyspark.sql import SparkSession
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load
  File "<frozen importlib._bootstrap>", line 957, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 664, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 634, in _load_backward_compatible
  File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 53, in <module>
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load
  File "<frozen importlib._bootstrap>", line 957, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 664, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 634, in _load_backward_compatible
  File "/opt/spark/python/lib/pyspark.zip/pyspark/rdd.py", line 48, in <module>
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load
  File "<frozen importlib._bootstrap>", line 957, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 664, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 634, in _load_backward_compatible
  File "/opt/spark/python/lib/pyspark.zip/pyspark/traceback_utils.py", line 23, in <module>
  File "/opt/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 390, in namedtuple
AttributeError: 'NoneType' object has no attribute 'items'

Python Versions:

  • Python 3.5

Spark Versions:

  • Spark Version: 3.1.1
  • Spark Version: 3.1.2
  • Spark Version: 3.1.3
  • Spark Version: 3.2.0
  • Spark Version: 3.2.1
  • Spark Version: 3.2.2
  • Spark Version: 3.2.3
  • Spark Version: 3.3.2
  • Spark Version: 3.4.0

8. SyntaxError: invalid syntax

Exception:

Traceback (most recent call last):
  File "/opt/pyspark_udf_example.py", line 3, in <module>
    from pyspark.sql import SparkSession
  File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 71
    def since(version: Union[str, float]) -> Callable[[F], F]:
                     ^
SyntaxError: invalid syntax

Python Versions:

  • Python 2.7

Spark Versions:

  • Spark Version: 3.3.0
  • Spark Version: 3.3.1
  • Spark Version: 3.3.2
  • Spark Version: 3.4.0

9. SyntaxError: invalid syntax

Exception:

Traceback (most recent call last):
  File "/opt/pyspark_udf_example.py", line 3, in <module>
    from pyspark.sql import SparkSession
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load
  File "<frozen importlib._bootstrap>", line 957, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 664, in _load_unlocked
  File "<frozen importlib._bootstrap>", line 634, in _load_backward_compatible
  File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 53, in <module>
  File "<frozen importlib._bootstrap>", line 968, in _find_and_load
  File "<frozen importlib._bootstrap>", line 953, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 896, in _find_spec
  File "<frozen importlib._bootstrap_external>", line 1171, in find_spec
  File "<frozen importlib._bootstrap_external>", line 1147, in _get_spec
  File "<frozen importlib._bootstrap_external>", line 1128, in _legacy_get_spec
  File "<frozen importlib._bootstrap>", line 444, in spec_from_loader
  File "<frozen importlib._bootstrap_external>", line 565, in spec_from_file_location
  File "/opt/spark/python/lib/pyspark.zip/pyspark/conf.py", line 110
    _jconf: Optional[JavaObject]
          ^
SyntaxError: invalid syntax

Python Versions:

  • Python 3.5

Spark Versions:

  • Spark Version: 3.3.0
  • Spark Version: 3.3.1
  • Spark Version: 3.3.2
  • Spark Version: 3.4.0

Note: The above all exceptions occurred while testing the pyspark code (sample udf) example with different Python versions.

51,185 Views
0 Kudos
Comments
avatar
New Contributor

Hi 

Do I have the chance to obtain the source code  of pyspark_udf_example.py ? 

/opt/pyspark_udf_example.py

We would like to perform some compatibility test for our python and spark version.

Thank you.

 

 

avatar
Master Collaborator

Hi @Leonm 

We have already published Spark supported Python version(s) in the below article:

https://community.cloudera.com/t5/Community-Articles/Spark-Python-Supportability-Matrix/ta-p/379144

Please let me know still you need pyspark udf example for testing?

avatar
New Contributor

Yes, please.

We may need that udf example code to test our enviroment in the future.

Thank your for the help.

 

avatar
New Contributor

Thanks a lot !

avatar
New Contributor

iam using below versions:

spark 2.4.8

Python 3.6.8

and got the above error when only run spark submit from nifi or oozie, but it works fine when run it using shell, is there solution or configuration i missed.

 

from pyspark.sql import SparkSession
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 51, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/context.py", line 31, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/accumulators.py", line 97, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 72, in <module>
File "<frozen importlib._bootstrap>", line 991, in _find_and_load
File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 655, in _load_unlocked
File "<frozen importlib._bootstrap>", line 618, in _load_backward_compatible
File "<frozen zipimport>", line 259, in load_module
File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 145, in <module>
File "/opt/cloudera/parcels/CDH-7.1.9-1.cdh7.1.9.p14.53489573/lib/spark/python/lib/pyspark.zip/pyspark/cloudpickle.py", line 126, in _make_cell_set_template_code
TypeError: an integer is required (got type bytes)