Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to Consume KAFKA messages using PIG ?

avatar
Expert Contributor

I would like to know that,

How can we consume kafka topic messages using PIG?

What are the jar files it requires?

Any suggestions.

Mohan.V

1 ACCEPTED SOLUTION

avatar
Super Guru

will be interested to know but with my best knowledge there is no datastorage schema in pig who can support consuming message from kafka.

pig is well suited for data in rest not for data in motion or streaming.

for publishing data into kafka you can leverages kafka bridge

https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-producer

View solution in original post

2 REPLIES 2

avatar
Super Guru

will be interested to know but with my best knowledge there is no datastorage schema in pig who can support consuming message from kafka.

pig is well suited for data in rest not for data in motion or streaming.

for publishing data into kafka you can leverages kafka bridge

https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-producer

avatar
Master Mentor

@Mohan V though there are efforts to make it work, there are no supported ways to do it directly with Kafka and Pig. You can leverage something like Apache Nifi to read from Kafka, dump to HDFS and then consume those messages with Pig. Since Kafka can produce messages continuously and Pig job has a start and end, it really isn't a good fit for it. All that said, here's an attempt to make it work. http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3C-3358174115189989131@unknownmsgid%3...