Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Scoring data on hadoop with Oryx at large scale

avatar
Frequent Visitor

Hi,

I was woundering if it is possible to classify data at large scale on hadoop - withing computation layer (as map-reduce task).  As far as i understand it works for Recommendations (as there is a "recommend" property for als-model in config file ) but how is it going to work with other models (random forest)?

Thank you

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works.

 

The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification.

 

Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.

View solution in original post

1 REPLY 1

avatar
Master Collaborator

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works.

 

The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification.

 

Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.