About Jason.Chen

Jason.Chen · ‎07-05-2015

Sean, I am not sure why. But, it seems relating to firewall. Our Oryx server is running in a virtiual Lan to talk to another virtual Lan firewall-ed. It looks the dynamic port is because of ephemeral port and a bug https://issues.apache.org/jira/browse/MAPREDUCE-6338 Still digging this issue.

Jason.Chen · ‎07-01-2015

Sean, (1) How Oryx serving layer knows there is new model generation? I traced the codes and it seems continues polling to check the generation status from folder. And, the delay for polling seems different from "no available generation" status (different minutes)... Is that right ? Can you explain a little bit? (2) When the new generation is replacing the old generation, I see a variable "recentlyActiveUsers"... What's that used for ? https://github.com/cloudera/oryx/blob/85001d5fce34d66c6d9179ab6e82e1cecb3b17ee/als-serving/src/main/java/com/cloudera/oryx/als/serving/generation/GenerationLoader.java#L116 (3) Question on "loadModel(int generationID, Generation currentGeneration)" ? How the model of old generation is replaced by new generation? https://github.com/cloudera/oryx/blob/85001d5fce34d66c6d9179ab6e82e1cecb3b17ee/als-serving/src/main/java/com/cloudera/oryx/als/serving/generation/GenerationLoader.java#L67 Originally, I am thinking new generation is loaded from disk to memory and then the old-generation in the memory is replaced by loaded new generation. However, after tracing the codes, it seems there is no such "hot-swap". Based on my tracing and assume currentGeneration=2 and generationID=3 (that's, currentGeneration is going to be replaced by 3). It seems (step1) the new model is saved to currentGeneration and (step2) the old entries left in currentGeneration are removed. If such a case, it looks possible there is an "intermediate" model mixing old and new generation. Will this "intermediate" model be queried by the recommender ? Can you confirm or explain a little bit ? Thanks.

Jason.Chen · ‎07-01-2015

Yes, I applied your commit... I went to an example http://[host]:8088/cluster/app/application_1435263631757_19721 But, I still not seeing the error. As I mentioned, the job/task is not really got killed or stopped. It just dropped some retrying info (as below), but it continues Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 0 time(s); maxRetries=3 Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 1 time(s); maxRetries=3

Jason.Chen · ‎07-01-2015

Sean, I applied your changes to our code base and still seeing the similar error (as below). I checked the job by using the job tracking URL (e.g., http://server105:8088/proxy/application_1432750221048_0525/) and actually there is no failed attempt. /// Logs //// Thu May 28 07:27:57 PDT 2015 INFO Running job "Oryx-/user/xyz/int/def-1-122-Y-RowStep: Avro(hdfs://server105:8020/u... ID=1 (1/1)" Thu May 28 07:27:57 PDT 2015 INFO Job status available at: http://server105:8088/proxy/application_1432750221048_0525/ Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 0 time(s); maxRetries=3 Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 1 time(s); maxRetries=3 Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 2 time(s); maxRetries=3 ... Thu May 28 07:34:15 PDT 2015 INFO Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server Thu May 28 07:34:16 PDT 2015 INFO Finished Oryx-/user/xyz/int/def-1-122-Y-RowStep Thu May 28 07:34:16 PDT 2015 INFO Completed RowStep in 379s

Jason.Chen · ‎06-29-2015

Cool. Just read your changes and it seems it only impacts the local computation (not Hadoop computation). Correct? Yes, I know Hadoop computation is already doing the right thing and no need to fix.

Jason.Chen · ‎06-29-2015

This is what I found. Looks odd, but can you double check it ? (1) For Hadoop version: our settings: model.decay.factor=1.0 model.decay.zeroThreshold=0.01 I do NOT see "Pruning near-zero entries" in the Oryx log. However, from the results, it seems actually performing the pruning... (2) For stand-alone version (local computation w/ one VM): same setting model.decay.factor=1.0 model.decay.zeroThreshold=0.01 I DO see "Pruning near-zero entries" in the Oryx log. However, from the results, it seems actually NOT performing the pruning... (Note) (a) For both cases I tested, it's in the generation 0. That's, there are no previous generation. (b) Our training data looks like the following: note the it's not "pre-aggregated" by "user-id, item-id".. user-1, item-a,1.24 user-1, item-a,0.002 user-1, item-b,0.005 user-2, item-c,0.007 user-3, item-c,0.006 user-3, item-d,2.5

Jason.Chen · ‎06-29-2015

Interesting.. Is that actually the source of the problem ? I checked my log and there are no container errors info. As I mentioned previously, the job did complete, but it complains cannot reach some servers during the process.

Jason.Chen · ‎06-29-2015

Hmm... Not seeing that in Oryx log. Is that in Hadoop log ? Which job step (MergeNewOldStep? RowStep ?)?

Jason.Chen · ‎06-28-2015

I see. Yes, Our settings model.decay.factor=1.0 model.decay.zeroThreshold=0.01 That explains it. However, is it only taking effect when running with Hadoop ? We use the same setting in local computation (single VM), but it seems not applying the threshold. Thanks.

Jason.Chen · ‎06-28-2015

Sean, As I posted in other discussion threads, we are trying to run Oryx 1.0 with CDH5.4.1. One thing I noticed is that when computing with Hadoop, we have 7.5 million users in training set. However, after the training , it generates X matrix of about 7.3 million users. I checked the log and not found any message/error related to this. I also tried the same training dataset in the local computation (single VM), and it gets 7.5 million users in X fine. I checked those users got lost during Hadoop computation and noticed that all their preference values are less than 0.01... I think it's definitely making sense to ignore association of very low preference value. But, I cannot find such "config" or "control"... Is there is such thing in Oryx ? Why it's filtering with Hadoop computation, but not in single VM computation ? Thanks.

Online	Offline
Last Visited	‎07-06-2015 01:40 AM

Member Since	‎07-18-2014 11:03 PM
Last Visited	‎07-06-2015 01:40 AM
Posts	74

Cloudera Community

Re: Run Oryx on a machine that is not part of the ...

How Oryx serving layer knows there is new model ge...

Re: Run Oryx on a machine that is not part of the ...

Re: Run Oryx on a machine that is not part of the ...

Re: Lost of users after training

Re: Lost of users after training

Re: Run Oryx on a machine that is not part of the ...

Re: Lost of users after training

Re: Lost of users after training

Lost of users after training