Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

pig word dictionary

avatar
Master Collaborator

Hi.

I am doing a wordcloud, but i dont know if there is any function to delete bad words like "aa", "bb", "wf", or somenthing like that.

thanks

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Roberto Sancho

great question once again, what you're asking for is commonly known as "stop words". There are different ways of addressing the problem. Instead of writing my own solution, here are some suggestions for you. Write map/reduce with stop words collection, write a UDF in Python, Groovy or Java, whichever is convenient for you, some examples here in Groovy and Python, I've done some work with Apache Crunch, there's a stop words example on the front page and finally here's a couple of suggestions to do it in Pig, last one is the most simple suggestion and I am curious to try it myself. It comes from Donald Miner, a famous champion of Apache Pig.

View solution in original post

1 REPLY 1

avatar
Master Mentor

@Roberto Sancho

great question once again, what you're asking for is commonly known as "stop words". There are different ways of addressing the problem. Instead of writing my own solution, here are some suggestions for you. Write map/reduce with stop words collection, write a UDF in Python, Groovy or Java, whichever is convenient for you, some examples here in Groovy and Python, I've done some work with Apache Crunch, there's a stop words example on the front page and finally here's a couple of suggestions to do it in Pig, last one is the most simple suggestion and I am curious to try it myself. It comes from Donald Miner, a famous champion of Apache Pig.