Member since
07-18-2014
74
Posts
0
Kudos Received
0
Solutions
05-25-2015
09:54 AM
Sean, Thanks for the confirmation. Yes, I understand that there could be more, due to JVM stuff, stack, the code/structures, etc. I noticed that our oryx-serving uses 9GB MEM after starting up, but the data seems to needs about 1.8GB MEM including Based on this, I do not understand why there is a big difference from 9GB and 1.8GB... Is my estimation wrong? Any thought ? Jason
... View more
05-24-2015
06:51 PM
Sean, I continued to dig into the memory usage, but moved focus to oryx-serving layer. I noticed from oryx serving log. It indicates the loading of several main objects: (1) Loaded feature vectors from .../Y/0.csv.gz (this is for item matrix) (2) Loaded known items from .../knownItems/0.csv.gz (3) Loaded feature vectors from .../X/0.csv.gz (this is for user matrix) Based on these loaded objects: I am thinking to compute the MEM used by the model when Oryx serving starts..Assume using feature ranking as 50 (1) (# of items) * 50 * 4 bytes (because each feature vector is with 50 floating points (4 bytes in Java)) (2) MEM requires is the "long" (8 bytes) of user-ID and plus (8 bytes)* (# of known item for each user) (3) (# of users) * 50 * 4 bytes (because each feature vector is with 50 floating points (4 bytes in Java)) The MEM usage is basically (1) + (2) + (3) Do I miss any important MEM computation for the Oryx-serving to load ? Thanks
... View more
05-24-2015
09:22 AM
Sean, Thanks for your reply. (1) Yes, heap size is set to 18G... Here is what I do (for Oryx ALS computation) java -Xmx18432m -Dconfig.file=/xxx/oryx.conf -jar /xxx/oryx-computation.jar (2) A side question: in the Oryx configuration file (https://github.com/cloudera/oryx/blob/master/common/src/main/resources/reference.conf), there are several settings for the computation that I can tune/set from the oryx configuration file. I think I can just put the settings as part of the Java JVM parameters, and it should replace with the default values inside config file. Confirm? For example (use model.features and model.alpha as examples) java -Xmx18432m -Dconfig.file=/xxx/oryx.conf -jar -Dmodel.features=50 -Dmodel.alpha=50 /xxx/oryx-computation.jar Thanks. Jason
... View more
05-23-2015
11:36 PM
I set the heap size as 18GB. During the ALS computation time, the logs indicates the following MEM warning message. Looks it's because of heap size. One thing confusing is that it indicates 19244MB heap used. If the report is correct, it should drop Out-Of-Memory exception (because my heap size is 18GB which is smaller than 19244 MB). I feel this is confusing. Thanks. Jason Sat May 23 15:36:34 PDT 2015 INFO 3800000 X/tag rows computed (19244MB heap)
Sat May 23 15:36:34 PDT 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
... View more
05-23-2015
09:13 AM
Sean, Want to know a little more about Oryx logs as below (ALS computation). In particular, what's the heap number ? Is it implying the MEM used by Oryx computation layer during the model computation time ? Sometimes, we see the number is not close to the heap initialized to Oryx, but it signals a warning. So, want to confirm what's the heap number shown below. Thanks. Jason Sat May 23 08:57:48 PDT 2015 INFO 5800000 X/tag rows computed (7876MB heap)
Sat May 23 08:57:50 PDT 2015 INFO 5900000 X/tag rows computed (10487MB heap)
Sat May 23 08:57:53 PDT 2015 INFO 6000000 X/tag rows computed (7108MB heap)
... View more
05-21-2015
10:48 PM
Sean, Can you provide more detailed information about how the approximation is computed in Oryx 2.0 ? Is it the same fold-in approach as Oryx 1.0 ? Can you point to the code base as reference ? Thanks. Jason
... View more
04-30-2015
10:58 PM
Sean, A follow up question: I want to know when Oryx will update the information obtained from the Hadoop configure files. I mean, when Oryx computation and serving layers start, the Hadoop config files are read. Then, if there are changes for Hadoop configure files, should I restart Oryx computation and serving layers in order to get updated config files ? In other words, when Oryx computation and serving layers read Hadoop configuration files ? Thanks.
... View more
04-10-2015
08:20 AM
Sean, One more thing to follow up... about Lock.. For example, I want to write an entry to X matrix, so I make something like this Lock writeLock = generation.getXLock().writeLock(); writeLock.lock(); // write to X matrix writeLock.unlock(); (1) My understanding is that then writeLock.lock() can avoid simultaneous write from other writer. Right ? (2) How about readLock ? Can you explian the siatuation to use readLock? Lock readLock = generation.getXLock().readLock(); readLock.lock();
... View more
03-29-2015
08:59 AM
Thanks. Reagarding what you mentioned "New users also cause a new row in X", just want to confirm with what I traced (please let me know if my following understading is not right)... Thanks. My tracing indicates that... (1) The "new X feature vector" is added in https://github.com/cloudera/oryx/blob/master/als-serving/src/main/java/com/cloudera/oryx/als/serving/ServerRecommender.java#L735 (2) The feature is updated in the following with foldIn https://github.com/cloudera/oryx/blob/master/als-serving/src/main/java/com/cloudera/oryx/als/serving/ServerRecommender.java#L697 (3) I do not need to worry about "candidate filter", because it's used for item..
... View more
03-28-2015
10:16 PM
Sean, Thanks for your reply. And, sorry for the late response. I was trying to implement a kNN based approach to approximate the latent features for a cold start new user. Then, now, I started to dig into the Oryx to see where to insert the new user's latent features into X matrix. Following your hints, I checked setPreference code and I have two questions: (1) I noticed that a code line as below and traced the related codes. I do not get why there is a CandidateFilter involved and "where" the item will be added to. Will it "append" a new entry to Y matrix in order to host the latent feature for the new item ? if (newItem) { generation.getCandidateFilter().addItem(itemID); } (https://github.com/cloudera/oryx/blob/master/als-serving/src/main/java/com/cloudera/oryx/als/serving/ServerRecommender.java#L692) (2) Follwing the question in (1), why there is no similar process to "newUser". Say, if there is a user-item association and the user-ID does not exist in the model, why there are statements such as.... if (newUser) { // do something } Thanks a lot.
... View more