Cloudera kafka pyspark KafkaProducer (ImportError: No module named kafka)


Need help on Kafka on Cloudera.
I wrote a program in pySpark in PyCharm it works good.


from kafka import KafkaProducer
from kafka.errors import KafkaError
producer = KafkaProducer(bootstrap_servers=[''])
tes = producer.send('my-first-topic', "this message from pyspark")


but when I run in my Linux Cloudera machine I get

File "/home/cloudera/kafka/", line 1, in <module>
from kafka import KafkaProducer
ImportError: No module named kafka


using command spark2-submit



Hey @AndyTech,


Thanks for reaching out to the Cloudera community.


This issue is due to the missing "kafka-python" module in your Python installation. You have to manually install the "kafka-python" module using the mentioned command in the edge node and all the hosts on which Spark job executes.


$ pip install kafka-python