Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Scoring data on hadoop with Oryx at large scale

avatar
New Contributor

Hi,

I was woundering if it is possible to classify data at large scale on hadoop - withing computation layer (as map-reduce task).  As far as i understand it works for Recommendations (as there is a "recommend" property for als-model in config file ) but how is it going to work with other models (random forest)?

Thank you

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works.

 

The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification.

 

Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.

View solution in original post

1 REPLY 1

avatar
Master Collaborator

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works.

 

The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification.

 

Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.