Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

How to find category of product column based on customer reviews text field using pig

How to find category of product column based on customer reviews text field using pig

New Contributor

step1:

Am having one csv file having schema and columns customer_id, reviews,date,product id loading into pig variable

step2:

product_id is there but product category is not present in the csv file how to create product category based on reviews coulmn exsisted csv

problem statement:

example in on reviews column having this type of text :product is dog biscuits was so nice my dog was healthy looking so good which belongs to "pet food" product category

please send me sample example code

please help me out how to create product category using text reviews

thanks in advance

swathi.T

1 REPLY 1

Re: How to find category of product column based on customer reviews text field using pig

Contributor

This is an NLP classification type of problem. You'll run into a few road blocks along the way. Unless you already have a trained model that classifies your information, via supervised or unsupervised methods, there are no simple code examples. At the very basic level, without using Machine learning, you can potentially use regex, albeit a complicated one, to test for specific words. Even then, you'll end up with several rules which would be unmaintainable.

You may want to search for pre-existing NLP classification datasets that are already 'labeled' or have been trained to recognize these categories -- I don't know of any off hand.

If you do find one, then things become easy and you could simply create a UDF that can run the text against the model and it would simply provide the label for you in return.

I know this isn't the answer your looking for but hope it puts you in the right path.