Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Is there a way to find the weights of every feature in spark ml model?

Solved Go to solution
Highlighted

Is there a way to find the weights of every feature in spark ml model?

New Contributor

Hi,

I am using spark ml classifier to create a model to predict whether a user will buy a product or not and I am using features both continuous and categorical. The model is predicting the outcome with some accuracy. Is there a way to see how much each feature is contributing to the prediction (i.e) the weight of each feature in the model created from training data? I am using spark 2.0.0 and all the classifiers available in it

Thanks in Advance

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Is there a way to find the weights of every feature in spark ml model?

@Rajasekaran Dhanasekaran How to do this is going to vary a lot of from model to model. So, e.g., RandomForestRegressionModel has .featureImportances whereas LogisticRegressionModel only has the feature weight matrix .weights.

To make things worse, there are very specific conditions under which it's possible to interpret model coefficients; if any features are correlated, "multicollinearities" arise and a parameter might be bigger or smaller, significant or non-significant depending on the correlated one. So it's not necessarily meaningful if you don't ensure there's no multicollinearity. Overall a model can still perform well even if this is the case though so you have to be careful. If you're certain that you've eliminated such correlations, then in regression regimes the model coefficients can be usually be interpreted somewhat directly. Since it sounds like you're doing binary classification (buy or not buy) you'll want to look into "how to interpret logistic regression coefficients".

If you're throwing hundreds or thousands of features at your problem, interpreting the coefficients isn't going to work well though. So, if you'd like to better understand how your features explain your data, I'd recommend looking at things like the ChiSqSelector feature selector or dimensionality reduction techniques such as PCA. You can use these at development time to more perspicuously answer questions like "what features best explain my data?".

1 REPLY 1

Re: Is there a way to find the weights of every feature in spark ml model?

@Rajasekaran Dhanasekaran How to do this is going to vary a lot of from model to model. So, e.g., RandomForestRegressionModel has .featureImportances whereas LogisticRegressionModel only has the feature weight matrix .weights.

To make things worse, there are very specific conditions under which it's possible to interpret model coefficients; if any features are correlated, "multicollinearities" arise and a parameter might be bigger or smaller, significant or non-significant depending on the correlated one. So it's not necessarily meaningful if you don't ensure there's no multicollinearity. Overall a model can still perform well even if this is the case though so you have to be careful. If you're certain that you've eliminated such correlations, then in regression regimes the model coefficients can be usually be interpreted somewhat directly. Since it sounds like you're doing binary classification (buy or not buy) you'll want to look into "how to interpret logistic regression coefficients".

If you're throwing hundreds or thousands of features at your problem, interpreting the coefficients isn't going to work well though. So, if you'd like to better understand how your features explain your data, I'd recommend looking at things like the ChiSqSelector feature selector or dimensionality reduction techniques such as PCA. You can use these at development time to more perspicuously answer questions like "what features best explain my data?".