I am new to the machine learning course I have dataset of clinical trials.It contains some textual as well as numerical data both(I have converted all the textual data/features into numeric by using Divectorization library of python).
I have attached dataset csv file as well as jupyter python notebook.Please check it.
if you want dataset description,then please visit below link and have used same public data from clinicaltrial.gov website.
Problem Statement:A dataset contains "ENROLLMENT" column(which shows number of participants required for clinical study) so,i want my algorithm should predict "ENROLLMENT" based on train data.
Please change the format from .txt to .csv for ct_gov_results and .txt to .ipynb for temporary_notebook file before you opens.
Issue: I am getting RMSE value as somewhat near to 3000 which is not good value.As per my knowledge it's value must be in between the range of 0 and 1.
I don't understand how to reduce it's value so that my algorithm will works fine for my data.
Please do response,Your reply will be very valuable for me.
Thanks in advance.