Member since
07-18-2014
74
Posts
0
Kudos Received
0
Solutions
07-05-2015
10:19 PM
Sean, I am not sure why. But, it seems relating to firewall. Our Oryx server is running in a virtiual Lan to talk to another virtual Lan firewall-ed. It looks the dynamic port is because of ephemeral port and a bug https://issues.apache.org/jira/browse/MAPREDUCE-6338 Still digging this issue.
... View more
07-01-2015
05:55 PM
Sean, (1) How Oryx serving layer knows there is new model generation? I traced the codes and it seems continues polling to check the generation status from folder. And, the delay for polling seems different from "no available generation" status (different minutes)... Is that right ? Can you explain a little bit? (2) When the new generation is replacing the old generation, I see a variable "recentlyActiveUsers"... What's that used for ? https://github.com/cloudera/oryx/blob/85001d5fce34d66c6d9179ab6e82e1cecb3b17ee/als-serving/src/main/java/com/cloudera/oryx/als/serving/generation/GenerationLoader.java#L116 (3) Question on "loadModel(int generationID, Generation currentGeneration)" ? How the model of old generation is replaced by new generation? https://github.com/cloudera/oryx/blob/85001d5fce34d66c6d9179ab6e82e1cecb3b17ee/als-serving/src/main/java/com/cloudera/oryx/als/serving/generation/GenerationLoader.java#L67 Originally, I am thinking new generation is loaded from disk to memory and then the old-generation in the memory is replaced by loaded new generation. However, after tracing the codes, it seems there is no such "hot-swap". Based on my tracing and assume currentGeneration=2 and generationID=3 (that's, currentGeneration is going to be replaced by 3). It seems (step1) the new model is saved to currentGeneration and (step2) the old entries left in currentGeneration are removed. If such a case, it looks possible there is an "intermediate" model mixing old and new generation. Will this "intermediate" model be queried by the recommender ? Can you confirm or explain a little bit ? Thanks.
... View more
07-01-2015
05:37 PM
Yes, I applied your commit... I went to an example http://[host]:8088/cluster/app/application_1435263631757_19721 But, I still not seeing the error. As I mentioned, the job/task is not really got killed or stopped. It just dropped some retrying info (as below), but it continues Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 0 time(s); maxRetries=3 Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 1 time(s); maxRetries=3
... View more
07-01-2015
08:34 AM
Sean, I applied your changes to our code base and still seeing the similar error (as below). I checked the job by using the job tracking URL (e.g., http://server105:8088/proxy/application_1432750221048_0525/) and actually there is no failed attempt. /// Logs //// Thu May 28 07:27:57 PDT 2015 INFO Running job "Oryx-/user/xyz/int/def-1-122-Y-RowStep: Avro(hdfs://server105:8020/u... ID=1 (1/1)" Thu May 28 07:27:57 PDT 2015 INFO Job status available at: http://server105:8088/proxy/application_1432750221048_0525/ Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 0 time(s); maxRetries=3 Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 1 time(s); maxRetries=3 Thu May 28 07:29:14 PDT 2015 INFO Retrying connect to server: server104/10.190.36.114:40915. Already tried 2 time(s); maxRetries=3 ... Thu May 28 07:34:15 PDT 2015 INFO Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server Thu May 28 07:34:16 PDT 2015 INFO Finished Oryx-/user/xyz/int/def-1-122-Y-RowStep Thu May 28 07:34:16 PDT 2015 INFO Completed RowStep in 379s
... View more
06-29-2015
08:01 AM
Cool. Just read your changes and it seems it only impacts the local computation (not Hadoop computation). Correct? Yes, I know Hadoop computation is already doing the right thing and no need to fix.
... View more
06-29-2015
07:04 AM
This is what I found. Looks odd, but can you double check it ? (1) For Hadoop version: our settings: model.decay.factor=1.0 model.decay.zeroThreshold=0.01 I do NOT see "Pruning near-zero entries" in the Oryx log. However, from the results, it seems actually performing the pruning... (2) For stand-alone version (local computation w/ one VM): same setting model.decay.factor=1.0 model.decay.zeroThreshold=0.01 I DO see "Pruning near-zero entries" in the Oryx log. However, from the results, it seems actually NOT performing the pruning... (Note) (a) For both cases I tested, it's in the generation 0. That's, there are no previous generation. (b) Our training data looks like the following: note the it's not "pre-aggregated" by "user-id, item-id".. user-1, item-a,1.24 user-1, item-a,0.002 user-1, item-b,0.005 user-2, item-c,0.007 user-3, item-c,0.006 user-3, item-d,2.5
... View more
06-29-2015
12:37 AM
Interesting.. Is that actually the source of the problem ? I checked my log and there are no container errors info. As I mentioned previously, the job did complete, but it complains cannot reach some servers during the process.
... View more
06-29-2015
12:29 AM
Hmm... Not seeing that in Oryx log. Is that in Hadoop log ? Which job step (MergeNewOldStep? RowStep ?)?
... View more
06-28-2015
11:55 PM
I see. Yes, Our settings model.decay.factor=1.0 model.decay.zeroThreshold=0.01 That explains it. However, is it only taking effect when running with Hadoop ? We use the same setting in local computation (single VM), but it seems not applying the threshold. Thanks.
... View more
06-28-2015
11:17 PM
Sean, As I posted in other discussion threads, we are trying to run Oryx 1.0 with CDH5.4.1. One thing I noticed is that when computing with Hadoop, we have 7.5 million users in training set. However, after the training , it generates X matrix of about 7.3 million users. I checked the log and not found any message/error related to this. I also tried the same training dataset in the local computation (single VM), and it gets 7.5 million users in X fine. I checked those users got lost during Hadoop computation and noticed that all their preference values are less than 0.01... I think it's definitely making sense to ignore association of very low preference value. But, I cannot find such "config" or "control"... Is there is such thing in Oryx ? Why it's filtering with Hadoop computation, but not in single VM computation ? Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Training