@Gayathri Devi, there is no direct method available for detecting outliers, but you can use quantiles approach to determine lower and upper bounds to filter the data. After the data is filtered, you can create ML Pipelines with all the transformation required to execute machine learning models (regression, classification etc).
Here is an example approach,
- Convert string fields to a numeric representation using StringIndexer
- Assemble string and numeric fields using Vector assembler
- Create Linear/Logistic regression model
- Create a ML pipeline with StringIndexerColumns, VectorAssembler and Model and execute on train data
- Use the trained model to make predictions on Test Data
- Create an evaluator and evaluate the predictions made on test data.
Please note that above approach was defined based on ML library instead of MLLib
Thanks
Kiran