Member since
07-18-2014
74
Posts
0
Kudos Received
0
Solutions
02-28-2015
10:44 AM
Sean, Follow up on the /ingest end point to model computation. I ingested a file of about 1GB zipped file (unzipped size is about 4GB)... It's about 10 millions users record. Questions: (1) I am thinking the generation 0 will be computed based on the 1 GB zipped file (i.e., 10 millions users). However, from the computation log, it seems to run the generation 0 with a sub set of the data (about 200MB zipped data), then it starts to run into generation 1... Is that normal ? (2) In the running into generation 1, it gets the out-of-menory issue as below (my VM is 24GB MEM 64 Linux and I run in local mode)... Then, it starts to trigger generation 1 computation again, but using differet copied /tmp/ file.... Any suggestions and comments? Thanks. (* Log *) Sat Feb 28 00:09:45 PST 2015 INFO Generation 0 complete
Sat Feb 28 00:10:45 PST 2015 INFO Running new generation due to elapsed time: 409 minutes
Sat Feb 28 00:10:45 PST 2015 INFO Starting run for instance
Sat Feb 28 00:10:45 PST 2015 INFO Last complete generation is 0
Sat Feb 28 00:10:45 PST 2015 INFO Making new generation 2
Sat Feb 28 00:10:45 PST 2015 INFO Waiting 209s for data to start uploading to generation 1 and then move to 2...
Sat Feb 28 00:14:15 PST 2015 INFO Running generation 1
Sat Feb 28 00:14:22 PST 2015 INFO Copying /tmp/1425111255160-0/oryx-append-2983995934077746346.csv.gz to /tmp/1425111255160-1
Sat Feb 28 00:14:22 PST 2015 INFO Copying /tmp/1425111255160-0/oryx-append-7628769517257349553.csv.gz to /tmp/1425111255160-1
Sat Feb 28 00:14:23 PST 2015 INFO Copying /tmp/1425111255160-0/oryx-append-1155998814849386940.csv.gz to /tmp/1425111255160-1
Sat Feb 28 00:14:23 PST 2015 INFO Copying /tmp/1425111255160-0/oryx-append-1103482790086927796.csv.gz to /tmp/1425111255160-1
Sat Feb 28 00:14:23 PST 2015 INFO Copying /tmp/1425111255160-0/oryx-append-6832095713207111419.csv.gz to /tmp/1425111255160-1
Sat Feb 28 00:14:23 PST 2015 INFO Copying /tmp/1425111255160-0/oryx-append-5243247952662826167.csv.gz to /tmp/1425111255160-1
Sat Feb 28 00:14:24 PST 2015 INFO Reading /tmp/1425111255160-3/0.csv.gz
Sat Feb 28 00:14:43 PST 2015 INFO Pruning near-zero entries
Sat Feb 28 00:14:44 PST 2015 INFO No input files in /tmp/1425111255160-5
Sat Feb 28 00:14:44 PST 2015 INFO Pruning near-zero entries
Sat Feb 28 00:14:44 PST 2015 INFO Reading /tmp/1425111255160-4/0.csv.gz
Sat Feb 28 00:14:53 PST 2015 INFO Reading /tmp/1425111255160-1/oryx-append-2983995934077746346.csv.gz
Sat Feb 28 00:15:50 PST 2015 INFO Reading /tmp/1425111255160-1/oryx-append-1155998814849386940.csv.gz
Sat Feb 28 00:16:20 PST 2015 INFO Reading /tmp/1425111255160-1/oryx-append-1103482790086927796.csv.gz
Sat Feb 28 00:16:55 PST 2015 INFO Reading /tmp/1425111255160-1/oryx-append-6832095713207111419.csv.gz
Sat Feb 28 00:17:54 PST 2015 INFO Reading /tmp/1425111255160-1/oryx-append-7628769517257349553.csv.gz
Sat Feb 28 00:19:03 PST 2015 INFO Reading /tmp/1425111255160-1/oryx-append-5243247952662826167.csv.gz
Sat Feb 28 00:20:00 PST 2015 INFO Pruning near-zero entries
Sat Feb 28 00:20:03 PST 2015 INFO Building factorization...
Sat Feb 28 00:20:03 PST 2015 INFO Starting from new, random Y matrix
Sat Feb 28 00:20:03 PST 2015 INFO Constructed initial Y
Sat Feb 28 00:20:03 PST 2015 INFO Executing ALS with parallelism 4
Sat Feb 28 00:20:56 PST 2015 INFO 100000 X/tag rows computed (4689MB heap)
Sat Feb 28 00:21:32 PST 2015 INFO 200000 X/tag rows computed (5207MB heap)
Sat Feb 28 00:22:08 PST 2015 INFO 300000 X/tag rows computed (5821MB heap)
Sat Feb 28 00:22:49 PST 2015 INFO 400000 X/tag rows computed (5192MB heap)
...
...
Sat Feb 28 00:37:25 PST 2015 INFO 2400000 X/tag rows computed (5160MB heap)
Sat Feb 28 00:38:11 PST 2015 INFO 2500000 X/tag rows computed (5935MB heap)
Sat Feb 28 00:38:11 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 00:39:02 PST 2015 INFO 2600000 X/tag rows computed (5858MB heap)
Sat Feb 28 00:39:59 PST 2015 INFO 2700000 X/tag rows computed (5934MB heap)
Sat Feb 28 00:39:59 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
...
...
Sat Feb 28 01:26:17 PST 2015 INFO 6100000 X/tag rows computed (5815MB heap)
Sat Feb 28 01:28:41 PST 2015 INFO 6200000 X/tag rows computed (5926MB heap)
Sat Feb 28 01:28:41 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 01:31:15 PST 2015 INFO 6300000 X/tag rows computed (5652MB heap)
Sat Feb 28 01:33:52 PST 2015 INFO 6400000 X/tag rows computed (5758MB heap)
Sat Feb 28 01:36:35 PST 2015 INFO 6500000 X/tag rows computed (5912MB heap)
Sat Feb 28 01:36:35 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 01:39:28 PST 2015 INFO 6600000 X/tag rows computed (5825MB heap)
Sat Feb 28 01:42:27 PST 2015 INFO 6700000 X/tag rows computed (5864MB heap)
Sat Feb 28 01:45:43 PST 2015 INFO 6800000 X/tag rows computed (5767MB heap)
Sat Feb 28 01:49:00 PST 2015 INFO 6900000 X/tag rows computed (5883MB heap)
Sat Feb 28 01:52:28 PST 2015 INFO 7000000 X/tag rows computed (5982MB heap)
Sat Feb 28 01:52:28 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 01:56:14 PST 2015 INFO 7100000 X/tag rows computed (5881MB heap)
Sat Feb 28 02:00:10 PST 2015 INFO 7200000 X/tag rows computed (5908MB heap)
Sat Feb 28 02:00:10 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 02:04:33 PST 2015 INFO 7300000 X/tag rows computed (5929MB heap)
Sat Feb 28 02:04:33 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 02:09:16 PST 2015 INFO 7400000 X/tag rows computed (5817MB heap)
Sat Feb 28 02:14:13 PST 2015 INFO 7500000 X/tag rows computed (5908MB heap)
Sat Feb 28 02:14:13 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 02:19:37 PST 2015 INFO 7600000 X/tag rows computed (5952MB heap)
Sat Feb 28 02:19:37 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 02:25:32 PST 2015 INFO 7700000 X/tag rows computed (5975MB heap)
Sat Feb 28 02:25:32 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 02:32:12 PST 2015 INFO 7800000 X/tag rows computed (5919MB heap)
Sat Feb 28 02:32:12 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 02:39:46 PST 2015 INFO 7900000 X/tag rows computed (5909MB heap)
Sat Feb 28 02:39:46 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 02:48:25 PST 2015 INFO 8000000 X/tag rows computed (5974MB heap)
Sat Feb 28 02:48:25 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 02:58:51 PST 2015 INFO 8100000 X/tag rows computed (5971MB heap)
Sat Feb 28 02:58:51 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 03:11:47 PST 2015 INFO 8200000 X/tag rows computed (5943MB heap)
Sat Feb 28 03:11:47 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 03:28:50 PST 2015 INFO 8300000 X/tag rows computed (5956MB heap)
Sat Feb 28 03:28:50 PST 2015 WARNING Memory is low. Increase heap size with -Xmx, decrease new generation size with larger -XX:NewRatio value, and/or use -XX:+UseCompressedOops
Sat Feb 28 03:38:35 PST 2015 WARNING Unexpected error in execution
com.cloudera.oryx.computation.common.JobException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.cloudera.oryx.als.computation.local.FactorMatrix.call(FactorMatrix.java:63)
at com.cloudera.oryx.als.computation.local.ALSLocalGenerationRunner.runSteps(ALSLocalGenerationRunner.java:98)
at com.cloudera.oryx.computation.common.GenerationRunner.runGeneration(GenerationRunner.java:236)
at com.cloudera.oryx.computation.common.GenerationRunner.call(GenerationRunner.java:109)
at com.cloudera.oryx.computation.PeriodicRunner.run(PeriodicRunner.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.commons.math3.linear.Array2DRowRealMatrix.copyOut(Array2DRowRealMatrix.java:529)
at org.apache.commons.math3.linear.Array2DRowRealMatrix.getData(Array2DRowRealMatrix.java:254)
at com.cloudera.oryx.common.math.QRDecomposition.<init>(QRDecomposition.java:107)
at com.cloudera.oryx.common.math.RRQRDecomposition.<init>(RRQRDecomposition.java:89)
at com.cloudera.oryx.common.math.CommonsMathLinearSystemSolver.getSolver(CommonsMathLinearSystemSolver.java:37)
at com.cloudera.oryx.common.math.MatrixUtils.getSolver(MatrixUtils.java:126)
at com.cloudera.oryx.als.common.factorizer.als.AlternatingLeastSquares$Worker.call(AlternatingLeastSquares.java:489)
at com.cloudera.oryx.als.common.factorizer.als.AlternatingLeastSquares$Worker.call(AlternatingLeastSquares.java:397)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
Sat Feb 28 03:39:35 PST 2015 INFO Running new generation due to elapsed time: 209 minutes
Sat Feb 28 03:39:35 PST 2015 INFO Starting run for instance
Sat Feb 28 03:39:35 PST 2015 INFO Last complete generation is 0
Sat Feb 28 03:39:36 PST 2015 INFO No need to make a new generation
Sat Feb 28 03:39:36 PST 2015 INFO Generation 2 is old enough to proceed
Sat Feb 28 03:39:36 PST 2015 INFO Running generation 1
Sat Feb 28 03:39:40 PST 2015 INFO Copying /tmp/1425123576534-0/oryx-append-7628769517257349553.csv.gz to /tmp/1425123576534-1
Sat Feb 28 03:39:40 PST 2015 INFO Copying /tmp/1425123576534-0/oryx-append-2983995934077746346.csv.gz to /tmp/1425123576534-1
Sat Feb 28 03:39:41 PST 2015 INFO Copying /tmp/1425123576534-0/oryx-append-1155998814849386940.csv.gz to /tmp/1425123576534-1
Sat Feb 28 03:39:41 PST 2015 INFO Copying /tmp/1425123576534-0/oryx-append-1103482790086927796.csv.gz to /tmp/1425123576534-1
Sat Feb 28 03:39:41 PST 2015 INFO Copying /tmp/1425123576534-0/oryx-append-6832095713207111419.csv.gz to /tmp/1425123576534-1
Sat Feb 28 03:39:42 PST 2015 INFO Copying /tmp/1425123576534-0/oryx-append-5243247952662826167.csv.gz to /tmp/1425123576534-1
Sat Feb 28 03:39:42 PST 2015 INFO Reading /tmp/1425123576534-3/0.csv.gz
Sat Feb 28 03:40:01 PST 2015 INFO Pruning near-zero entries
Sat Feb 28 03:40:02 PST 2015 INFO No input files in /tmp/1425123576534-5
Sat Feb 28 03:40:02 PST 2015 INFO Pruning near-zero entries
Sat Feb 28 03:40:03 PST 2015 INFO Reading /tmp/1425123576534-4/0.csv.gz
Sat Feb 28 03:40:09 PST 2015 INFO Reading /tmp/1425123576534-1/oryx-append-7628769517257349553.csv.gz
Sat Feb 28 03:41:10 PST 2015 INFO Reading /tmp/1425123576534-1/oryx-append-1155998814849386940.csv.gz
Sat Feb 28 03:41:50 PST 2015 INFO Reading /tmp/1425123576534-1/oryx-append-1103482790086927796.csv.gz
Sat Feb 28 03:42:19 PST 2015 INFO Reading /tmp/1425123576534-1/oryx-append-2983995934077746346.csv.gz
Sat Feb 28 03:43:22 PST 2015 INFO Reading /tmp/1425123576534-1/oryx-append-6832095713207111419.csv.gz
Sat Feb 28 03:44:35 PST 2015 INFO Reading /tmp/1425123576534-1/oryx-append-5243247952662826167.csv.gz
Sat Feb 28 03:45:31 PST 2015 INFO Pruning near-zero entries
Sat Feb 28 03:45:34 PST 2015 INFO Building factorization...
Sat Feb 28 03:45:34 PST 2015 INFO Starting from new, random Y matrix
Sat Feb 28 03:45:34 PST 2015 INFO Constructed initial Y
Sat Feb 28 03:45:34 PST 2015 INFO Executing ALS with parallelism 4
Sat Feb 28 03:47:13 PST 2015 INFO 100000 X/tag rows computed (4885MB heap)
Sat Feb 28 03:49:46 PST 2015 INFO 200000 X/tag rows computed (4786MB heap)
...
Sat Feb 28 05:21:19 PST 2015 INFO 1700000 X/tag rows computed (4883MB heap)
Sat Feb 28 05:49:59 PST 2015 INFO 1800000 X/tag rows computed (4882MB heap)
Sat Feb 28 05:57:33 PST 2015 WARNING Unexpected error in execution
com.cloudera.oryx.computation.common.JobException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at com.cloudera.oryx.als.computation.local.FactorMatrix.call(FactorMatrix.java:63)
at com.cloudera.oryx.als.computation.local.ALSLocalGenerationRunner.runSteps(ALSLocalGenerationRunner.java:98)
at com.cloudera.oryx.computation.common.GenerationRunner.runGeneration(GenerationRunner.java:236)
at com.cloudera.oryx.computation.common.GenerationRunner.call(GenerationRunner.java:109)
at com.cloudera.oryx.computation.PeriodicRunner.run(PeriodicRunner.java:214)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
at org.apache.commons.math3.linear.Array2DRowRealMatrix.<init>(Array2DRowRealMatrix.java:62)
at org.apache.commons.math3.linear.Array2DRowRealMatrix.createMatrix(Array2DRowRealMatrix.java:145)
at org.apache.commons.math3.linear.AbstractRealMatrix.transpose(AbstractRealMatrix.java:612)
at com.cloudera.oryx.common.math.QRDecomposition.<init>(QRDecomposition.java:107)
at com.cloudera.oryx.common.math.RRQRDecomposition.<init>(RRQRDecomposition.java:89)
at com.cloudera.oryx.common.math.CommonsMathLinearSystemSolver.getSolver(CommonsMathLinearSystemSolver.java:37)
at com.cloudera.oryx.common.math.MatrixUtils.getSolver(MatrixUtils.java:126)
at com.cloudera.oryx.als.common.factorizer.als.AlternatingLeastSquares$Worker.call(AlternatingLeastSquares.java:489)
at com.cloudera.oryx.als.common.factorizer.als.AlternatingLeastSquares$Worker.call(AlternatingLeastSquares.java:397)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
... 3 more
... View more
02-26-2015
10:38 PM
Sean, As moving to use Oryx API /ingest end points, I have questions as below... (1) From the Oryx UI /ingest, it allows you to input userID,itemID,value... If I input two events (userID-1,itemID-1, 2.0) and then (userID-1,itemID-1, 1.0) , then system will accumulate the value (based on user id and item id), so that the equivilent is (userID-1,itemID-1, 3.0) ? (2) From the Oryx UI /ingest, it allows you to input userID,itemID,value from the CSV file. Same as (1), will it "auto" aggregrate the value inside the CSV based on user id and item id ? Thanks
... View more
02-26-2015
10:31 PM
ya. It looks my local issue to locate Guava. Thanks for your reply.
... View more
02-23-2015
06:36 PM
Hi Sean, I got the laetest Oryx version (1.0.2-SNAPSHOT) from https://github.com/cloudera/oryx... And, tried the use maven to build it... I got the following error messages. Any suggestions? What's the Guava version used in Oryx ? I am using Java 1.8.. Thanks. By the way, I used to download 1.0.1-SNAPSHOT motnhs ago and built it fine without problems. -------- [INFO] Reactor Summary: [INFO] [INFO] Oryx .............................................. SUCCESS [0.762s] [INFO] Oryx Common ....................................... FAILURE [2.740s] [INFO] Oryx Common for Serving and Computation ........... SKIPPED [INFO] Oryx Alternating Least Squares Common ............. SKIPPED [INFO] Oryx Serving Layer Common ......................... SKIPPED [INFO] Oryx Alternating Least Squares Serving ............ SKIPPED [INFO] Oryx Computation Layer Common ..................... SKIPPED [INFO] Oryx Alternating Least Squares Computation ........ SKIPPED [INFO] Oryx K-Means Common ............................... SKIPPED [INFO] Oryx K-Means Serving .............................. SKIPPED [INFO] Oryx K-Means Computation .......................... SKIPPED [INFO] Oryx Random Decision Forests Common ............... SKIPPED [INFO] Oryx Random Decision Forests Serving .............. SKIPPED [INFO] Oryx Random Decision Forests Computation .......... SKIPPED [INFO] Oryx Computation Layer ............................ SKIPPED [INFO] Oryx Serving Layer ................................ SKIPPED [INFO] ------------------------------------------------------------------------ [INFO] BUILD FAILURE [INFO] ------------------------------------------------------------------------ [INFO] Total time: 3.688s [INFO] Finished at: Mon Feb 23 18:28:29 PST 2015 [INFO] Final Memory: 20M/114M [INFO] ------------------------------------------------------------------------ [ERROR] com.google.common.base.Supplier not found. java.lang.ClassNotFoundException: com.google.common.base.Supplier not found. at org.apache.bcel.util.ClassLoaderRepository.loadClass(ClassLoaderRepository.java:91) at org.apache.bcel.classfile.JavaClass.getInterfaces(JavaClass.java:788) at org.apache.bcel.classfile.JavaClass.getAllInterfaces(JavaClass.java:804) at net.sf.clirr.core.internal.bcel.BcelJavaType.getAllInterfaces(BcelJavaType.java:78) at net.sf.clirr.core.internal.checks.InterfaceSetCheck.check(InterfaceSetCheck.java:58) at net.sf.clirr.core.Checker.runClassChecks(Checker.java:190) at net.sf.clirr.core.Checker.reportDiffs(Checker.java:136) at org.codehaus.mojo.clirr.AbstractClirrMojo.reportDiffs(AbstractClirrMojo.java:740) at org.codehaus.mojo.clirr.AbstractClirrMojo.executeClirr(AbstractClirrMojo.java:308) at org.codehaus.mojo.clirr.AbstractClirrCheckMojo.doExecute(AbstractClirrCheckMojo.java:74) at org.codehaus.mojo.clirr.AbstractClirrMojo.execute(AbstractClirrMojo.java:244) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:106) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:317) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:152) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:555) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:214) at org.apache.maven.cli.MavenCli.main(MavenCli.java:158) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
... View more
01-04-2015
06:47 PM
Sean, As moving to use several Oryx API end points, I have questions as below... (1) For API "/because" : I traced the codes and it seems the influence score is computed based on the Y-latent feature similarity from the requested item-ID with the known items associated to the requested user-ID. Confirm? In the ALS paper (Hu, et. al., "Collaborative Filtering for Implicit Feedback Datasets"), there is a section (Section 5: Explaining recommendations) regarding why a specific item was recommended to the user. Is Oryx implementation following that paper's approach ? (2) Is it possible to get the users' input data (i.e., rating data as saved in R-matrix RbyRow, RbyColumn)? I noticed there is no such API and I am wondering how to get that information from the internal structure. Thanks.
... View more
12-22-2014
06:00 PM
Hi, I am trying to run Oryx on a machine that is not part of the cluster... My setting for the oryx.conf is as below (about the Hadoop/HDFS settings)... Is that a right setting ? Is there something else I need to set for the oryx.conf model=${als-model} model.instance-dir=hdfs://name_node:8020/oryx_data model.local-computation=false model.local-data=false Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
HDFS
12-11-2014
08:48 AM
Sean, We are trying to handle the cold start user case. This implies there is no known user-item association. Our approach is using users' profile. (1) Given a new user u, find the kNN users based on profile similarity. (2) Use these kNN users's latent vectors (from X matrix) to approximate the latent feature for new user u. So, the scenario is that we "already" know the latent feature to add. Question becomes "how to add the feature vector directly to Oryx"? I am afraid to break internal structure and break approximation and re-scoring logic. That's why I am looking for your guidance/suggestions in the given scenario that we know what latent vector to "add"... One "work around" I am thinking is that maybe I can simulate a dummy user-item association for the new user and add that dummy preference to Oryx. And, then, after kNN based latent vector computation, I "modify" (not "add") the latent vector of that new user from existing X matrix. Thanks.
... View more
12-10-2014
09:25 AM
PreferenceServlet adds user-item preferences. For cold-start users, there are no such info. I am wondering if there are the ways to "add" latent vectors (X matrix) directly without breaking other related data structure. Thanks.
... View more
12-09-2014
05:35 PM
Sean, Thanks. Ya, I figured it out and is able to get latent features for users using some functions call (e.g., getCurrentGeneration(), getX(), getIDMapping() ) It seems latent vector retrieval is fine. I cannot figure out the way to append a latent vector to Matrix X. Say, there is a new user, we have an external routine figuring out his latent vector (based on kNN of user profiles and latent features from Oryx). Now, I want to "append" the new latent vector to Matrix X and also idMapping. Any guidance about how to perform this is appreciated. Thanks.
... View more
12-06-2014
09:00 PM
Sean, To follow up this, I would like to get your suggestion to hack the code a bit. The goal is getting the latent features for items and users and where is a good starting point? Thanks.
... View more