Support Questions

Find answers, ask questions, and share your expertise

Cloudera kafka pyspark KafkaProducer (ImportError: No module named kafka)

avatar
Explorer

Need help on Kafka on Cloudera.
I wrote a program in pySpark in PyCharm it works good.

 

from kafka import KafkaProducer
from kafka.errors import KafkaError
producer = KafkaProducer(bootstrap_servers=['192.168.56.103:9092'])
tes = producer.send('my-first-topic', "this message from pyspark")
producer.flush()

 

but when I run in my Linux Cloudera machine I get

File "/home/cloudera/kafka/kproducer.py", line 1, in <module>
from kafka import KafkaProducer
ImportError: No module named kafka

 

using command spark2-submit kproducer.py

1 REPLY 1

avatar
Expert Contributor

Hey @AndyTech,

 

Thanks for reaching out to the Cloudera community.

 

This issue is due to the missing "kafka-python" module in your Python installation. You have to manually install the "kafka-python" module using the mentioned command in the edge node and all the hosts on which Spark job executes.

 

$ pip install kafka-python