Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

During the clean-up process of sentiment analysis, we have selected an equal number of happy and sad tweets to prevent bias in the model. Is it possible to keep all data without removing the excess amount and still prevent bias in another way?

Highlighted

During the clean-up process of sentiment analysis, we have selected an equal number of happy and sad tweets to prevent bias in the model. Is it possible to keep all data without removing the excess amount and still prevent bias in another way?

New Contributor

In the data preprocessing step, we choose an equal number of happy and sad tweets. This causes us to lose a considerable amount of data. Is it possible that we contine with different amounts of happy and sad tweets while protecting the unbiased form of our model?