Support Questions
Find answers, ask questions, and share your expertise

How to find category of product column based on customer reviews text field using pig


Am having one csv file having schema and columns customer_id, reviews,date,product id loading into pig variable


product_id is there but product category is not present in the csv file how to create product category based on reviews coulmn exsisted csv

problem statement:

example in on reviews column having this type of text :product is dog biscuits was so nice my dog was healthy looking so good which belongs to "pet food" product category

please send me sample example code

please help me out how to create product category using text reviews

thanks in advance




This is an NLP classification type of problem. You'll run into a few road blocks along the way. Unless you already have a trained model that classifies your information, via supervised or unsupervised methods, there are no simple code examples. At the very basic level, without using Machine learning, you can potentially use regex, albeit a complicated one, to test for specific words. Even then, you'll end up with several rules which would be unmaintainable.

You may want to search for pre-existing NLP classification datasets that are already 'labeled' or have been trained to recognize these categories -- I don't know of any off hand.

If you do find one, then things become easy and you could simply create a UDF that can run the text against the model and it would simply provide the label for you in return.

I know this isn't the answer your looking for but hope it puts you in the right path.

; ;