Member since
Kudos Received
01:52 AM
It includes an implementation of classification using random decision forests. Decision forests actually support both categorical and numeric features. However, for text classification, you're correct that you typically transform your text into numeric vectors via TF-IDF first. This is something you'd have to do separately. Yes, the dimensionality is high. Decision forests can be fine with this, but, they're not the most natural choice for text classification. You may see what I mean that Oryx is not a tool for classification, but a tool for productionizing, which happens to have an implementation of a classifier. In 2.x, you also have an implementation of decision forests, and also don't have magic TF-IDF built in or anything. However the architecture is much more supportive of putting your own Spark-based pipeline and model build into the framework. 1.x did not support this.
... View more