Created 02-05-2014 05:02 AM
Hello
- I start computation & serving layer, and setup in collaborative filtering as desribed in Cloudera example
model=${als-model}
model.instance-dir=/tmp/oryx/example
model.local-computation=true
model.local-data=true
model.features=25
model.lambda=0.065
- l start calling /pref API to add some data (e.g.: /pref/user1/xxxx), and thus many time in order to get some data in Oryx.
- Therefore, each time I call /recommend/user1, I got 503 error, "com.cloudera.oryx.als.common.NotReadyException: API method unavailable until model has been built and loaded"
So does I miss something ?
Regards
PS: the servers are working corretly with the sample data (audioscrobbler), but my error occurs on a fresh new install starting from 0.
Created 02-05-2014 05:57 AM
I would simply call /refresh after you have ingested whatever data you have already. It sounds like you want it to just start with what data it has at a certain point, and that is part of what /refresh does. After that, let it rebuild automatically.
The properties are model.time-threshold and model.data-threshold. These are measured in minutes, and megabytes. A rebuild happens when either threshold is exceeded -- time elapsed, data written.
Actually, I misspoke; these default to -1, meaning, do not use a threshold. By default it would not rebuild except on demand. So you should set at least one of these to match your requirements.
Also, note that if you start the Computation Layer and there is no model, and the first generation has any data, it will force itself to run. So you could also simply stuff your data in the generation 00000/inbound dir ahead of time.
Created 02-08-2014 05:00 AM
OK, the problem turned out to be that the # of features was too high for the tiny amount of data. The model was being built but rejected. It does log a message to this effect when this happens.
Created 02-09-2014 06:28 AM
Created 02-05-2014 05:47 AM
That's right. Until the model is created, there is nothing with which to answer queries.
The default behavior is to wait until a certain amount of data has been written, and the build or rebuild the model. You can configure this.
But I would suggest you can simply force it when you are ready by calling /refresh
Usually you would probably have some historical data to learn from before you launch. That would have been fed in, and created a model, before you would start querying it.
Created 02-05-2014 05:53 AM
Thanks you srowen !
Ok i understand, therefore how I know:
- the actual amount of data that has been written
- the exact amount of data from which the model will be rebuilt ?
Do I have to manually call /refresh or it is done automatilcally ? If so, at which frequency ?
Best regards
Created 02-05-2014 05:57 AM
I would simply call /refresh after you have ingested whatever data you have already. It sounds like you want it to just start with what data it has at a certain point, and that is part of what /refresh does. After that, let it rebuild automatically.
The properties are model.time-threshold and model.data-threshold. These are measured in minutes, and megabytes. A rebuild happens when either threshold is exceeded -- time elapsed, data written.
Actually, I misspoke; these default to -1, meaning, do not use a threshold. By default it would not rebuild except on demand. So you should set at least one of these to match your requirements.
Also, note that if you start the Computation Layer and there is no model, and the first generation has any data, it will force itself to run. So you could also simply stuff your data in the generation 00000/inbound dir ahead of time.
Created 02-08-2014 04:22 AM
Hello
A few days later, it is still in same mode, so no recommendation are generated even by my often call /pref/x/y
I was using just before Myrrix in same context and it is was working. Instead of having one Myrix instance, I have 2 instances (computation & serving), but the apps using it is the same (calling serving layer on same port with same APIs)
So I have doubt. How I check there is data stored in Oryx each time I call the serving layer. I see in "EndPoint stats", that the RecommendServlet is correctly called without error, but how to be sure the data is stored and ready for recommendation computation ?
Regards
Created 02-08-2014 04:38 AM
See my previous message. Did you ever call /refresh? or set a threshold that you have definitelye exceeded? A model won't get built otherwise. The behavior is different, yes.
Created on 02-08-2014 04:40 AM - edited 02-08-2014 04:42 AM
Yes even with the /refresh, the /recommend/X says 503 error
But now I have configure as you mention a model.time-threshold = 5 to make it doing every 5mn.
But I still get the 503 error
Created 02-08-2014 04:45 AM
/refresh can't throw a 503 error, at least, I can't see any way for it to do so. Are you sure you are not calling /ready?
Can you please provide logs that are printed after calling /refresh? or really, all your logs. You can mail them directly to me if you like at sowen @ cloudera
Created 02-08-2014 05:00 AM
OK, the problem turned out to be that the # of features was too high for the tiny amount of data. The model was being built but rejected. It does log a message to this effect when this happens.
Created 02-09-2014 04:23 AM
FYI, it is working now !
One more question, I can get N recommendations by calling /recommend/X with howMany parameter.
Is there any performence issue for Oryx if I call for howMany=10 or howMany=1000 ? It is already computed so Oryx is just providing the data as it (not computing again) ?