Created 03-03-2016 02:34 PM
Hi.
I am doing a wordcloud, but i dont know if there is any function to delete bad words like "aa", "bb", "wf", or somenthing like that.
thanks
Created 03-04-2016 02:18 AM
great question once again, what you're asking for is commonly known as "stop words". There are different ways of addressing the problem. Instead of writing my own solution, here are some suggestions for you. Write map/reduce with stop words collection, write a UDF in Python, Groovy or Java, whichever is convenient for you, some examples here in Groovy and Python, I've done some work with Apache Crunch, there's a stop words example on the front page and finally here's a couple of suggestions to do it in Pig, last one is the most simple suggestion and I am curious to try it myself. It comes from Donald Miner, a famous champion of Apache Pig.
Created 03-04-2016 02:18 AM
great question once again, what you're asking for is commonly known as "stop words". There are different ways of addressing the problem. Instead of writing my own solution, here are some suggestions for you. Write map/reduce with stop words collection, write a UDF in Python, Groovy or Java, whichever is convenient for you, some examples here in Groovy and Python, I've done some work with Apache Crunch, there's a stop words example on the front page and finally here's a couple of suggestions to do it in Pig, last one is the most simple suggestion and I am curious to try it myself. It comes from Donald Miner, a famous champion of Apache Pig.
 
					
				
				
			
		
