Support Questions

verytest · ‎02-05-2014

Hello

- I start computation & serving layer, and setup in collaborative filtering as desribed in Cloudera example

model=${als-model}
model.instance-dir=/tmp/oryx/example
model.local-computation=true
model.local-data=true
model.features=25
model.lambda=0.065

- l start calling /pref API to add some data (e.g.: /pref/user1/xxxx), and thus many time in order to get some data in Oryx.

- Therefore, each time I call /recommend/user1, I got 503 error, "com.cloudera.oryx.als.common.NotReadyException: API method unavailable until model has been built and loaded"

So does I miss something ?

Regards

PS: the servers are working corretly with the sample data (audioscrobbler), but my error occurs on a fresh new install starting from 0.

srowen · ‎02-05-2014

I would simply call /refresh after you have ingested whatever data you have already. It sounds like you want it to just start with what data it has at a certain point, and that is part of what /refresh does. After that, let it rebuild automatically.

The properties are model.time-threshold and model.data-threshold. These are measured in minutes, and megabytes. A rebuild happens when either threshold is exceeded -- time elapsed, data written.

Actually, I misspoke; these default to -1, meaning, do not use a threshold. By default it would not rebuild except on demand. So you should set at least one of these to match your requirements.

Also, note that if you start the Computation Layer and there is no model, and the first generation has any data, it will force itself to run. So you could also simply stuff your data in the generation 00000/inbound dir ahead of time.

View solution in original post

srowen · ‎02-08-2014

OK, the problem turned out to be that the # of features was too high for the tiny amount of data. The model was being built but rejected. It does log a message to this effect when this happens.

View solution in original post

srowen · ‎02-09-2014

This is good. There is no performance difference between computing 10 and 100 recommendations since it still considers all non-filtered items each time. (OK I suppose it takes a tiny bit longer to send 100 results over the network than 10.) The results are not precomputed but computed on the fly each time.

View solution in original post

srowen · ‎02-05-2014

That's right. Until the model is created, there is nothing with which to answer queries.

The default behavior is to wait until a certain amount of data has been written, and the build or rebuild the model. You can configure this.

But I would suggest you can simply force it when you are ready by calling /refresh

Usually you would probably have some historical data to learn from before you launch. That would have been fed in, and created a model, before you would start querying it.

verytest · ‎02-05-2014

Thanks you srowen !

Ok i understand, therefore how I know:

- the actual amount of data that has been written

- the exact amount of data from which the model will be rebuilt ?

Do I have to manually call /refresh or it is done automatilcally ? If so, at which frequency ?

Best regards

srowen · ‎02-05-2014

I would simply call /refresh after you have ingested whatever data you have already. It sounds like you want it to just start with what data it has at a certain point, and that is part of what /refresh does. After that, let it rebuild automatically.

The properties are model.time-threshold and model.data-threshold. These are measured in minutes, and megabytes. A rebuild happens when either threshold is exceeded -- time elapsed, data written.

Actually, I misspoke; these default to -1, meaning, do not use a threshold. By default it would not rebuild except on demand. So you should set at least one of these to match your requirements.

Also, note that if you start the Computation Layer and there is no model, and the first generation has any data, it will force itself to run. So you could also simply stuff your data in the generation 00000/inbound dir ahead of time.

verytest · ‎02-08-2014

Hello

A few days later, it is still in same mode, so no recommendation are generated even by my often call /pref/x/y

I was using just before Myrrix in same context and it is was working. Instead of having one Myrix instance, I have 2 instances (computation & serving), but the apps using it is the same (calling serving layer on same port with same APIs)

So I have doubt. How I check there is data stored in Oryx each time I call the serving layer. I see in "EndPoint stats", that the RecommendServlet is correctly called without error, but how to be sure the data is stored and ready for recommendation computation ?

Regards

srowen · ‎02-08-2014

See my previous message. Did you ever call /refresh? or set a threshold that you have definitelye exceeded? A model won't get built otherwise. The behavior is different, yes.

verytest · ‎02-08-2014

Yes even with the /refresh, the /recommend/X says 503 error

But now I have configure as you mention a model.time-threshold = 5 to make it doing every 5mn.

But I still get the 503 error

srowen · ‎02-08-2014

/refresh can't throw a 503 error, at least, I can't see any way for it to do so. Are you sure you are not calling /ready?

Can you please provide logs that are printed after calling /refresh? or really, all your logs. You can mail them directly to me if you like at sowen @ cloudera

srowen · ‎02-08-2014

OK, the problem turned out to be that the # of features was too high for the tiny amount of data. The model was being built but rejected. It does log a message to this effect when this happens.

verytest · ‎02-09-2014

FYI, it is working now !

One more question, I can get N recommendations by calling /recommend/X with howMany parameter.

Is there any performence issue for Oryx if I call for howMany=10 or howMany=1000 ? It is already computed so Oryx is just providing the data as it (not computing again) ?

Cloudera Community

Support Questions

Oryx: API method unavailable until model has been built and loaded