I'm using HDP-2.4 sandbox platform for my python application. The Kafka consumer streams messages to Spark Streaming application. With the Stateless transformations everything works fine. The problem arise when I switch to Stateful transformation with updateStateByKey() and checkpoint().
I get the following error:
ERROR StreamingContext: Error starting the context, marking it as stopped
java.io.IOException: org.apache.spark.SparkException: An exception was raised by Python:
Traceback (most recent call last):
File "/usr/hdp/126.96.36.199-169/spark/python/pyspark/streaming/util.py", line 105, in dumps
File "/usr/lib64/python2.6/pickle.py", line 306, in save
rv = reduce(self.proto)
TypeError: 'JavaPackage' object is not callable
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
... 61 more
The jar libraries and packages that I load are as follows:
The combination of libraries used in HDP-2.4 like Python 2.6, Spark 1.6.0 Kafka 0.9 and Scala 2.10, py4j-0.9-src.zip show that they are compatible (I'm not sure regarding version of Python that is really old).