Am having one csv file having schema and columns customer_id, reviews,date,product id loading into pig variable
product_id is there but product category is not present in the csv file how to create product category based on reviews coulmn exsisted csv
example in on reviews column having this type of text :product is dog biscuits was so nice my dog was healthy looking so good which belongs to "pet food" product category
please send me sample example code
please help me out how to create product category using text reviews
thanks in advance
This is an NLP classification type of problem. You'll run into a few road blocks along the way. Unless you already have a trained model that classifies your information, via supervised or unsupervised methods, there are no simple code examples. At the very basic level, without using Machine learning, you can potentially use regex, albeit a complicated one, to test for specific words. Even then, you'll end up with several rules which would be unmaintainable.
You may want to search for pre-existing NLP classification datasets that are already 'labeled' or have been trained to recognize these categories -- I don't know of any off hand.
If you do find one, then things become easy and you could simply create a UDF that can run the text against the model and it would simply provide the label for you in return.
I know this isn't the answer your looking for but hope it puts you in the right path.