Created 05-23-2015 09:13 AM
Sean,
Want to know a little more about Oryx logs as below (ALS computation).
In particular, what's the heap number ? Is it implying the MEM used by Oryx computation layer during the model computation time ?
Sometimes, we see the number is not close to the heap initialized to Oryx, but it signals a warning. So, want to confirm what's the heap number shown below.
Thanks.
Jason
Sat May 23 08:57:48 PDT 2015 INFO 5800000 X/tag rows computed (7876MB heap) Sat May 23 08:57:50 PDT 2015 INFO 5900000 X/tag rows computed (10487MB heap) Sat May 23 08:57:53 PDT 2015 INFO 6000000 X/tag rows computed (7108MB heap)
Created 05-29-2015 12:50 AM
Yes, that's a good reason, if you have to scale up past one machine. Previously I thought you mean you were running an entire Hadoop cluster on one machine, which is fine for a test but much slower and more complex than a simple non-Hadoop 1-machine setup. I The mapper and reducer will need more memory if you see them running out of memory. If memory is very low but not exhausted, a Java process slows down in too much GC. Otherwise more memory does not help. More nodes does not necessarily help. You still face the overhead of task scheduling and data transfer, and the time taken to do non-distributed work. In fact, if you set up your workers to not live on the same nodes as data nodes, it will be a lot slower. For your scale, which fits in one machine easily, 7 nodes is big overkill, and 60 is way too big to provide any advantage. You're measuring pure Hadoop overhead, which you can tune, but is not reflecting work done. The upshot is you should be able to handle data sizes hundreds or thousands of times larger this way, at roughly the same amount of time. For small data sets, you see why there is no value in trying to use a large cluster; it's just too tiny to split up.
Created 05-23-2015 10:39 AM
Created 05-23-2015 11:36 PM
I set the heap size as 18GB.
During the ALS computation time, the logs indicates the following MEM warning message. Looks it's because of heap size. One thing confusing is that it indicates 19244MB heap used. If the report is correct, it should drop Out-Of-Memory exception (because my heap size is 18GB which is smaller than 19244 MB). I feel this is confusing.
Thanks.
Jason
Sat May 23 15:36:34 PDT 2015 INFO 3800000 X/tag rows computed (19244MB heap) Sat May 23 15:36:34 PDT 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Created 05-24-2015 04:30 AM
Created 05-24-2015 09:22 AM
Sean,
Thanks for your reply.
(1) Yes, heap size is set to 18G... Here is what I do (for Oryx ALS computation)
java -Xmx18432m -Dconfig.file=/xxx/oryx.conf -jar /xxx/oryx-computation.jar
(2) A side question: in the Oryx configuration file (https://github.com/cloudera/oryx/blob/master/common/src/main/resources/reference.conf), there are several settings for the computation that I can tune/set from the oryx configuration file.
I think I can just put the settings as part of the Java JVM parameters, and it should replace with the default values inside config file. Confirm?
For example (use model.features and model.alpha as examples)
java -Xmx18432m -Dconfig.file=/xxx/oryx.conf -jar -Dmodel.features=50 -Dmodel.alpha=50 /xxx/oryx-computation.jar
Thanks.
Jason
Created 05-24-2015 12:54 PM
Yes it uses Typesafe Config (https://github.com/typesafehub/config) so you should be able to set values on the command line too. Hm, maybe I should change that log to also output the current max heap, if only to be more informative and help debug. I'm not sure why you are seeing that.
Created on 05-24-2015 06:51 PM - edited 06-08-2015 07:19 PM
Sean,
I continued to dig into the memory usage, but moved focus to oryx-serving layer.
I noticed from oryx serving log. It indicates the loading of several main objects:
(1) Loaded feature vectors from .../Y/0.csv.gz (this is for item matrix)
(2) Loaded known items from .../knownItems/0.csv.gz
(3) Loaded feature vectors from .../X/0.csv.gz (this is for user matrix)
Based on these loaded objects: I am thinking to compute the MEM used by the model when Oryx serving starts..Assume using feature ranking as 50
(1) (# of items) * 50 * 4 bytes (because each feature vector is with 50 floating points (4 bytes in Java))
(2) MEM requires is the "long" (8 bytes) of user-ID and plus (8 bytes)* (# of known item for each user)
(3) (# of users) * 50 * 4 bytes (because each feature vector is with 50 floating points (4 bytes in Java))
The MEM usage is basically (1) + (2) + (3)
Do I miss any important MEM computation for the Oryx-serving to load ?
Thanks
Created 05-25-2015 12:07 AM
That's right, though there's probably a little more than this due to other JVM overheads and other much smaller data structures, but yeah that's a good start at an estimate.
Created on 05-25-2015 09:54 AM - edited 06-08-2015 07:21 PM
Sean,
Thanks for the confirmation.
Yes, I understand that there could be more, due to JVM stuff, stack, the code/structures, etc.
I noticed that our oryx-serving uses 9GB MEM after starting up, but the data seems to needs about 1.8GB MEM including
Based on this, I do not understand why there is a big difference from 9GB and 1.8GB... Is my estimation wrong? Any thought ?
Jason
Created 05-25-2015 10:19 AM
If you just mean the heap has grown to 9GB, that is normal in the sense that it does not mean 9GB of memory is actually in use. If you have an 18GB heap then a major GC has likely not happened since there is no memory pressure. I would expect this to drop significantly after a major GC. To test, you can force a GC on the running process with "jcmd GC.run" in Java 7+.