Support Questions

Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

How to Consume KAFKA messages using PIG ?

Expert Contributor

I would like to know that,

How can we consume kafka topic messages using PIG?

What are the jar files it requires?

Any suggestions.

Mohan.V

1 ACCEPTED SOLUTION

will be interested to know but with my best knowledge there is no datastorage schema in pig who can support consuming message from kafka.

pig is well suited for data in rest not for data in motion or streaming.

for publishing data into kafka you can leverages kafka bridge

https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-producer

View solution in original post

2 REPLIES 2

will be interested to know but with my best knowledge there is no datastorage schema in pig who can support consuming message from kafka.

pig is well suited for data in rest not for data in motion or streaming.

for publishing data into kafka you can leverages kafka bridge

https://github.com/kafka-dev/kafka/tree/master/contrib/hadoop-producer

Mentor

@Mohan V though there are efforts to make it work, there are no supported ways to do it directly with Kafka and Pig. You can leverage something like Apache Nifi to read from Kafka, dump to HDFS and then consume those messages with Pig. Since Kafka can produce messages continuously and Pig job has a start and end, it really isn't a good fit for it. All that said, here's an attempt to make it work. http://mail-archives.apache.org/mod_mbox/pig-user/201308.mbox/%3C-3358174115189989131@unknownmsgid%3...

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.