- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to use Oryx 1 to detect spam email
Created ‎03-05-2016 02:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Sean,
We are trying to use Oryx 1 to detect spam email. We have trating data (spam emails with subject and email body as text).
Can we use Oryx 1 classification to resolve such a problem?
If so, how ?
Thanks.
Chien
Created ‎03-07-2016 01:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
forests. Decision forests actually support both categorical and
numeric features. However, for text classification, you're correct
that you typically transform your text into numeric vectors via TF-IDF
first. This is something you'd have to do separately. Yes, the
dimensionality is high. Decision forests can be fine with this, but,
they're not the most natural choice for text classification.
You may see what I mean that Oryx is not a tool for classification,
but a tool for productionizing, which happens to have an
implementation of a classifier.
In 2.x, you also have an implementation of decision forests, and also
don't have magic TF-IDF built in or anything. However the architecture
is much more supportive of putting your own Spark-based pipeline and
model build into the framework. 1.x did not support this.
Created ‎03-06-2016 11:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
kind of learning system. Just making a model is something you should
do with other tools whose purpose is to build models. Oryx 1 is not
exactly deprecated, but Oryx 2 is the only version in active
development, and I'd really encourage you to look there. The good news
is that it's a lot easier in 2.x to reuse a model building process you
created in, say, Spark. In 1.x it's not possible.
Created ‎03-06-2016 01:38 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks for the reply.
We will try Oryx 2. However, my question is how to use Oryx (1 or 2) to support spam email classification?
Frrom the Oryx classification exapmle, it looks it requires sample examples with numeric values, while email body is textual. One way is to convert email body into tf*idf features and then it's numeric values to apply Oryx classifier.
It looks the vector dimensition is too high though. Any suggestions ? Thanks.
Created ‎03-07-2016 01:52 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
forests. Decision forests actually support both categorical and
numeric features. However, for text classification, you're correct
that you typically transform your text into numeric vectors via TF-IDF
first. This is something you'd have to do separately. Yes, the
dimensionality is high. Decision forests can be fine with this, but,
they're not the most natural choice for text classification.
You may see what I mean that Oryx is not a tool for classification,
but a tool for productionizing, which happens to have an
implementation of a classifier.
In 2.x, you also have an implementation of decision forests, and also
don't have magic TF-IDF built in or anything. However the architecture
is much more supportive of putting your own Spark-based pipeline and
model build into the framework. 1.x did not support this.
