Support Questions

Find answers, ask questions, and share your expertise

Oryx log info of ALS

avatar
Explorer

Sean,

 

Want to know a little more about Oryx logs as below (ALS computation).

In particular, what's the heap number ? Is it implying the MEM used by Oryx computation layer during the model computation time ?

Sometimes, we see the number is not close to the heap initialized to Oryx, but it signals a warning.  So, want to confirm what's the heap number shown below.

 

Thanks.

Jason

 

 

Sat May 23 08:57:48 PDT 2015 INFO 5800000 X/tag rows computed (7876MB heap)
Sat May 23 08:57:50 PDT 2015 INFO 5900000 X/tag rows computed (10487MB heap)
Sat May 23 08:57:53 PDT 2015 INFO 6000000 X/tag rows computed (7108MB heap)

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Yes, that's a good reason, if you have to scale up past one machine. Previously I thought you mean you were running an entire Hadoop cluster on one machine, which is fine for a test but much slower and more complex than a simple non-Hadoop 1-machine setup. I The mapper and reducer will need more memory if you see them running out of memory. If memory is very low but not exhausted, a Java process slows down in too much GC. Otherwise more memory does not help. More nodes does not necessarily help. You still face the overhead of task scheduling and data transfer, and the time taken to do non-distributed work. In fact, if you set up your workers to not live on the same nodes as data nodes, it will be a lot slower. For your scale, which fits in one machine easily, 7 nodes is big overkill, and 60 is way too big to provide any advantage. You're measuring pure Hadoop overhead, which you can tune, but is not reflecting work done. The upshot is you should be able to handle data sizes hundreds or thousands of times larger this way, at roughly the same amount of time. For small data sets, you see why there is no value in trying to use a large cluster; it's just too tiny to split up.

View solution in original post

13 REPLIES 13

avatar
Master Collaborator
Yes it is just the current heap usage, which is probably not near the max you set. It is normal. What warning do you mean?

avatar
Explorer

I set the heap size as 18GB.

During the ALS computation time, the logs indicates the following MEM warning message. Looks it's because of heap size. One thing confusing is that it indicates 19244MB heap used. If the report is correct, it should drop Out-Of-Memory exception (because my heap size is 18GB which is smaller than 19244 MB). I feel this is confusing.

 

Thanks.

 

Jason

 

 

Sat May 23 15:36:34 PDT 2015 INFO 3800000 X/tag rows computed (19244MB heap)
Sat May 23 15:36:34 PDT 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops

 

avatar
Master Collaborator
If you're not seeing a problem you can ignore it. The thing I'd watch for is if you are nearly out of memory and are spending a lot of time in GC. If so then more heap or these other settings might help.

Are you sure the heap is just 18gb? I agree this doesnt quite make sense otherwise. The memory estimate is just that but shouldn't ever be more than the heap total

avatar
Explorer

Sean,

 

Thanks for your reply.

 

(1) Yes, heap size is set to 18G... Here is what I do (for Oryx ALS computation)

java -Xmx18432m -Dconfig.file=/xxx/oryx.conf -jar /xxx/oryx-computation.jar

 

(2) A side question: in the Oryx configuration file (https://github.com/cloudera/oryx/blob/master/common/src/main/resources/reference.conf), there are several settings for the computation that I can tune/set from the oryx configuration file.

I think I can just put the settings as part of the Java JVM parameters, and it should replace with the default values inside config file. Confirm?

For example (use model.features and model.alpha as examples)

java -Xmx18432m -Dconfig.file=/xxx/oryx.conf -jar -Dmodel.features=50 -Dmodel.alpha=50 /xxx/oryx-computation.jar

 

 

Thanks.

 

Jason

avatar
Master Collaborator

Yes it uses Typesafe Config (https://github.com/typesafehub/config) so you should be able to set values on the command line too. Hm, maybe I should change that log to also output the current max heap, if only to be more informative and help debug. I'm not sure why you are seeing that.

avatar
Explorer

Sean,

 

I continued to dig into the memory usage, but moved focus to oryx-serving layer.

I noticed from oryx serving log. It indicates the loading of several main objects:

 

(1) Loaded feature vectors from .../Y/0.csv.gz (this is for item matrix)

(2) Loaded known items from .../knownItems/0.csv.gz

(3) Loaded feature vectors from .../X/0.csv.gz (this is for user matrix)

 

Based on these loaded objects: I am thinking to compute the MEM used by the model when Oryx serving starts..Assume using feature ranking as 50

(1) (# of items) * 50 * 4 bytes (because each feature vector is with 50 floating points (4 bytes in Java))

(2) MEM requires is the "long" (8 bytes) of user-ID and plus (8 bytes)* (# of known item for each user)

(3) (# of users) * 50 * 4 bytes (because each feature vector is with 50 floating points (4 bytes in Java))

 

The MEM usage is basically (1) + (2) + (3)

 

Do I miss any important MEM computation for the Oryx-serving to load ?

 

Thanks

 

 

 

avatar
Master Collaborator

That's right, though there's probably a little more than this due to other JVM overheads and other much smaller data structures, but yeah that's a good start at an estimate.

avatar
Explorer

Sean,

 

Thanks for the confirmation.

 

Yes, I understand that there could be more, due to JVM stuff, stack, the code/structures, etc.

 

I noticed that our oryx-serving uses 9GB MEM after starting up, but the data seems to needs about 1.8GB MEM including

 

 

Based on this, I do not understand why there is a big difference from 9GB and 1.8GB... Is my estimation wrong? Any thought ?

 

Jason

 

 

 

avatar
Master Collaborator

If you just mean the heap has grown to 9GB, that is normal in the sense that it does not mean 9GB of memory is actually in use. If you have an 18GB heap then a major GC has likely not happened since there is no memory pressure. I would expect this to drop significantly after a major GC. To test, you can force a GC on the running process with "jcmd GC.run" in Java 7+.