Archives of Support Questions (Read Only)

Lingn · ‎07-29-2014

Hi,

I was woundering if it is possible to classify data at large scale on hadoop - withing computation layer (as map-reduce task). As far as i understand it works for Recommendations (as there is a "recommend" property for als-model in config file ) but how is it going to work with other models (random forest)?

Thank you

srowen · ‎07-29-2014

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works.

The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification.

Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.

View solution in original post

srowen · ‎07-29-2014

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works.

The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification.

Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.

Cloudera Community

Archives of Support Questions (Read Only)

Scoring data on hadoop with Oryx at large scale