Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Scoring data on hadoop with Oryx at large scale

SOLVED Go to solution

Scoring data on hadoop with Oryx at large scale

New Contributor

Hi,

I was woundering if it is possible to classify data at large scale on hadoop - withing computation layer (as map-reduce task).  As far as i understand it works for Recommendations (as there is a "recommend" property for als-model in config file ) but how is it going to work with other models (random forest)?

Thank you

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Scoring data on hadoop with Oryx at large scale

Master Collaborator

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works.

 

The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification.

 

Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.

1 REPLY 1

Re: Scoring data on hadoop with Oryx at large scale

Master Collaborator

Bad news: not directly. the design goal here is real-time scoring. You could write a process that queries an embedded Serving Layer, or, calls to one via HTTP. It's a bit more overhead, but certainly works.

 

The bulk recommend function is a hold-over from the older code base, really. There wasn't an equivalent for classification.

 

Good news: since the output is a PMML model, and libraries like openscoring exist, you could fairly easily wire up a Mapper that loads a model and scores data.