Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Classifying music lyrics

avatar
New Contributor

Hi, let me briefly describe my problem:

Initial task was to try some NLP practices in close real-world problems. We decided to start with simple classification problem - predict a genre for a music lyric. We had a strong requirement of using Java, Spark 2.0.0 with ML library (not MLLib). ML library has a limited number of algorithms, so we started with binary classification for 2 genres and simple pipeline with Word2Vec and Logistic Regression. It showed acceptable results. Than we decided to add one genre. So we had to some other algorithm, because Logistic Regression works only for binary problems. So we've tried 3 approaches:

1. Bag of words + Naive Bayes - around 82% precision 2. Word2Vec + Logistic Regression + One vs Rest - around 65% precision 3. Word2Vec + MixMaxScaler + Naive Bayes - aroung 58% precision First approach showed a good results, but the concern is that Bag of Words is a bit old and it doesn't handle well similar words. So we're still in search with better solution with Word2Vec. We had a thought to try Desicion Tree or Random Forecast, but we're not sure how performant it will be with large vectors (100, 200, 300) and large datasets. I read that it's good to use *Tree approaches when you have small number of feature.

Maybe you could recommend some other approach based on your experience? Any help is very appreciated. I have to reming that we're strongly tied to Spark 2.0.0 + ML library due to DevOps infrastructure.

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
3 REPLIES 3

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Master Guru