Support Questions

Manus · ‎10-17-2016

Hi Guys,

I am new to the machine learning course I have dataset of clinical trials.It contains some textual as well as numerical data both(I have converted all the textual data/features into numeric by using Divectorization library of python).

I have attached dataset csv file as well as jupyter python notebook.Please check it.

if you want dataset description,then please visit below link and have used same public data from clinicaltrial.gov website.

https://clinicaltrials.gov/ct2/about-studies/glossary

Problem Statement:A dataset contains "ENROLLMENT" column(which shows number of participants required for clinical study) so,i want my algorithm should predict "ENROLLMENT" based on train data.

Please change the format from .txt to .csv for ct_gov_results and .txt to .ipynb for temporary_notebook file before you opens.

Issue: I am getting RMSE value as somewhat near to 3000 which is not good value.As per my knowledge it's value must be in between the range of 0 and 1.

I don't understand how to reduce it's value so that my algorithm will works fine for my data.

Please do response,Your reply will be very valuable for me.

Thanks in advance.

mrizvi · ‎11-08-2016

@Manoj Dhake , it depends on the dependent variable. The unit of RMSE is same as dependent variable. If your data has a range of 0 to 100000 then RMSE value of 3000 is small, but if the range goes from 0 to 1, it is pretty huge. Try to play with other input variables, and compare your RMSE values. The smaller the RMSE value, the better the model.

Also, try to compare your RMSE values of both training and testing data. If they are almost similar, your model is good. If the RMSE for the testing data is much higher than that of the training data, it is likely that you've badly over fit the data.

View solution in original post

mrizvi · ‎11-08-2016

@Manoj Dhake , it depends on the dependent variable. The unit of RMSE is same as dependent variable. If your data has a range of 0 to 100000 then RMSE value of 3000 is small, but if the range goes from 0 to 1, it is pretty huge. Try to play with other input variables, and compare your RMSE values. The smaller the RMSE value, the better the model.

Also, try to compare your RMSE values of both training and testing data. If they are almost similar, your model is good. If the RMSE for the testing data is much higher than that of the training data, it is likely that you've badly over fit the data.

Brajesh · ‎02-24-2021

"If your data has a range of 0 to 100000 then RMSE value of 3000 is small, but if the range goes from 0 to 1." Range going from 0 to 1 means?

Cloudera Community

Support Questions

How to reduce RMSE(Root Mean Squred Error) value for linear regression in machine learning?