Member since
08-11-2014
481
Posts
92
Kudos Received
72
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3456 | 01-26-2018 04:02 AM | |
| 7094 | 12-22-2017 09:18 AM | |
| 3539 | 12-05-2017 06:13 AM | |
| 3863 | 10-16-2017 07:55 AM | |
| 11239 | 10-04-2017 08:08 PM |
07-02-2015
12:14 AM
Yes but the question is why. This is just a message from the driver program saying the master can't be found. The question is what happened to the Application Master. If you find it in YARN, can you see what happened to that container? it almost surely failed to start but why?
... View more
07-02-2015
12:14 AM
It is just polling HDFS for new files on the order of ~5 minutes or so. No that message is exactly from this process of refreshing the model by looking for any new model. "No available generation" means no models have been built. There's a delay between the time new data arrives -- which could include a new user or item -- and when that is incorporated into a model. It could be a long time depending on how long you take to build models. When a new model arrives, you can't just drop all existing users, since the new model won't have any info about very new users or items. This is to help keep track of which users/items should be retained in memory even if they do not exist in the new model. The new model replaces the old one user-by-user and item-by-item rather than by loading an entire new model. Yes you have a state with old and new data at once but this is fine for recommendations; they're not incompatible. It's just the current and newer state of an estimate of the user/item vectors.
... View more
07-01-2015
09:34 AM
Just to check, you have this commit right? https://github.com/cloudera/oryx/commit/4b5e557a36f3d666bab0befc21b79efdf1fcd52d The symptom here is that the App Master for the MR job dies straight away, and can't be contacted. The important thing is to know why. For example when I looked at the AM app screen (i.e. http://[host]:8088/cluster/app/application_1435553713675_0018) I saw something like ... Application application_1435553713675_0018 failed 2 times due to AM Container for appattempt_1435553713675_0018_000002 exited with exitCode: -104 For more detailed output, check application tracking page:http://[host]:8088/proxy/application_1435553713675_0018/Then, click on links to logs of each attempt. Diagnostics: Container [pid=13840,containerID=container_1435553713675_0018_02_000001] is running beyond physical memory limits. Current usage: 421.5 MB of 384 MB physical memory used; 2.7 GB of 806.4 MB virtual memory used. Killing container. ... Do you see anything like that that says why the AM stopped?
... View more
06-29-2015
07:55 AM
Got it, that's a bug. I fixed it and pushed to master: https://github.com/cloudera/oryx/issues/115
... View more
06-29-2015
12:51 AM
For the stand-alone version? there's no Hadoop. I mean in the Oryx log yes. I suppose my next question then is if you're sure this config is being used in your stand-alone mode? You can see where it's applied in "ReadInputs".
... View more
06-29-2015
12:41 AM
It's pretty likely. It would not be in the logs but in the error shown on the attempt's (dead) container's info screen in the history server. At least, I saw the same thing exactly and this resolved it, and I can sort of see why this is now a problem in Java 7.
... View more
06-29-2015
12:18 AM
This is the problem; fix coming momentarily: https://github.com/cloudera/oryx/issues/114 I never saw a Snappy issue. I'm on CDH 5.4.2. Right now it seems to be running OK after the above.
... View more
06-29-2015
12:15 AM
No should work the same in both cases. You should see a message like "Pruning near-zero entries". Are you seeing that much? that would start to narrow it down.
... View more
06-28-2015
11:42 PM
Yes, if model.decay.zeroThreshold is positive then anything whose abs is smaller is pruned. This can mean entire users are removed if none of their prefs survive. Do you set this or decay.factor? by default it's all off and nothing decays though.
... View more
06-28-2015
11:29 PM
I see the same thing now. I bet that if you click through to the failed container you see an error like Diagnostics: Container [pid=13840,containerID=container_1435553713675_0018_02_000001] is running beyond physical memory limits. Current usage: 421.5 MB of 384 MB physical memory used; 2.7 GB of 806.4 MB virtual memory used. Killing container. If so then at least we have the cause. I see what is failing but not yet why as there's not a good reason the AM would only be allowed 384MB. It's a YARN config thing somewhere.
... View more