Support Questions

Find answers, ask questions, and share your expertise

Scoring data on hadoop with Oryx at large scale

avatar
New Contributor

Hi,

I was woundering if it is possible to classify data at large scale on hadoop - withing computation layer (as map-reduce task).  As far as i understand it works for Recommendations (as there is a "recommend" property for als-model in config file ) but how is it going to work with other models (random forest)?

Thank you

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works.

 

The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification.

 

Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.

View solution in original post

1 REPLY 1

avatar
Master Collaborator

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works.

 

The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification.

 

Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.