Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pig Accumulator in Spark

avatar
Master Guru

are there functions out there that utilize something like the accumulator interface in Pig where the data doesn't have to stay in memory?

1 ACCEPTED SOLUTION

avatar

I'm not aware of the concept of Spark's Accumulators exposed as "first-class" objects in Pig and have always advised that you would need to build a UDF for such activities if you couldn't simply get away with filtering the things to count (such as "good" records and "rejects") into separate aliases then count them up.

Here is a blog post going down the UDF path; https://dzone.com/articles/counters-apache-pig.

Good luck & I'd love to hear if there was something I've been missing all along directly from Pig.

View solution in original post

1 REPLY 1

avatar

I'm not aware of the concept of Spark's Accumulators exposed as "first-class" objects in Pig and have always advised that you would need to build a UDF for such activities if you couldn't simply get away with filtering the things to count (such as "good" records and "rejects") into separate aliases then count them up.

Here is a blog post going down the UDF path; https://dzone.com/articles/counters-apache-pig.

Good luck & I'd love to hear if there was something I've been missing all along directly from Pig.