Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

How to Consume KAFKA messages using PIG ?

avatar
Expert Contributor

I would like to know that,

How can we consume kafka topic messages using PIG?

What are the jar files it requires?

Any suggestions.

Mohan.V

1 ACCEPTED SOLUTION

avatar
Super Guru

will be interested to know but with my best knowledge there is no datastorage schema in pig who can support consuming message from kafka.

pig is well suited for data in rest not for data in motion or streaming.

for publishing data into kafka you can leverages kafka bridge

https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-producer

View solution in original post

2 REPLIES 2

avatar
Super Guru

will be interested to know but with my best knowledge there is no datastorage schema in pig who can support consuming message from kafka.

pig is well suited for data in rest not for data in motion or streaming.

for publishing data into kafka you can leverages kafka bridge

https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-producer

avatar
Master Mentor

@Mohan V though there are efforts to make it work, there are no supported ways to do it directly with Kafka and Pig. You can leverage something like Apache Nifi to read from Kafka, dump to HDFS and then consume those messages with Pig. Since Kafka can produce messages continuously and Pig job has a start and end, it really isn't a good fit for it. All that said, here's an attempt to make it work. http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3C-3358174115189989131@unknownmsgid%3...