About JasonChen1114

srowen · ‎03-07-2016

It includes an implementation of classification using random decision forests. Decision forests actually support both categorical and numeric features. However, for text classification, you're correct that you typically transform your text into numeric vectors via TF-IDF first. This is something you'd have to do separately. Yes, the dimensionality is high. Decision forests can be fine with this, but, they're not the most natural choice for text classification. You may see what I mean that Oryx is not a tool for classification, but a tool for productionizing, which happens to have an implementation of a classifier. In 2.x, you also have an implementation of decision forests, and also don't have magic TF-IDF built in or anything. However the architecture is much more supportive of putting your own Spark-based pipeline and model build into the framework. 1.x did not support this.

Online	Offline
Last Visited	‎03-13-2016 01:55 PM

Member Since	‎03-05-2016 02:05 PM
Last Visited	‎03-13-2016 01:55 PM
Posts	2

Cloudera Community

Re: How to use Oryx 1 to detect spam email