Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Cloudera kafka pyspark KafkaProducer (ImportError: No module named kafka)

Explorer

Need help on Kafka on Cloudera.
I wrote a program in pySpark in PyCharm it works good.

 

from kafka import KafkaProducer
from kafka.errors import KafkaError
producer = KafkaProducer(bootstrap_servers=['192.168.56.103:9092'])
tes = producer.send('my-first-topic', "this message from pyspark")
producer.flush()

 

but when I run in my Linux Cloudera machine I get

File "/home/cloudera/kafka/kproducer.py", line 1, in <module>
from kafka import KafkaProducer
ImportError: No module named kafka

 

using command spark2-submit kproducer.py

1 REPLY 1

Contributor

Hey @AndyTech,

 

Thanks for reaching out to the Cloudera community.

 

This issue is due to the missing "kafka-python" module in your Python installation. You have to manually install the "kafka-python" module using the mentioned command in the edge node and all the hosts on which Spark job executes.

 

$ pip install kafka-python

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.