About christinan

christinan · ‎12-03-2014

@sowen wrote: Did you build the binary to match your version of Hadoop? that's the safest thing. What are you using? pom.xml in Oryx says it builds for 2.5.1, and that's exactly what I have on my machine (<hadoop.version>2.5.1</hadoop.version> in pom). Nevertheless, I just ran this: mvn install -Dhadoop.version=2.5.1 now (I did not specify the version before, presumed the one from pom file is used). Also left the tests to be run, all passed. After that things performed exactly as before, I am afraid, where X and Y do not have sufficient rank (and all fine with local computation). The Cloudera Quick Start VM, where I ran some other tests, has 2.5.0, so there was a mismatch there, and I will build Oryx for 2.5.0 and retry there as well. @sowen wrote: IDs don't matter. If they are strings representing long values they are used directly (i.e. "123" hashes to 123). Indeed, I have now changed the ids to more sensible numbers, but that made no difference. @sowen wrote: OK, I'll try to get time to try it myself. Thank you for that. If there is anything I can do to help, I will be online and reachable by email at christina dot androne at gmail.

christinan · ‎12-03-2014

Hi Sean I have made more experiments and I think the problem somewhere in Oryx's code and not with my Hadoop installation. Because: I booted up the latest version of Cloudera Quickstart VM and run: - in memory computation => MAP 0.15 - Hadoop computation => insufficient rank Ran a Hadoop computation of the audioscrobbler dataset, on my installation of Hadoop this time, and this produced X and Y with sufficient rank. So, to conclude, there seems to be an issue strictly related to the format of my files... any hints? File is UTF-8 encoded, line endings are LF... will try rebuilding the user ids (they are currently very large numbers).

christinan · ‎12-02-2014

... I looked better at Oryx source code and it does seem to depend on Snappy 1.0.4.1. The version I have installed in Hadoop is 1.1.2, could this be an issue with one party compressing with a version and the other decompressing with a different one? If they added breaking changes (though I expect a jump to 2.x.x)?

christinan · ‎12-02-2014

Hi. Thank you and apologies for my delay in replying, I am being shared between projects... I have tried a few other datasets, smaller ones, and the issue is present for them as well. For this last small one, MAP in memory is ~0.14, with Hadoop is 0.06. It does look like something is wrong with my Hadoop installation, however I can't figure out why, the steps are quite simple. @srowen wrote: You can ignore the native libraries message. It doesn't affect anything. Right, X and Y are deleted after. It may be hard to view them before that happens. The hash from IDs to ints is a consistent one, so the same string will always map to the same ID. Just a side and low priority question here, why do the user ids get generated, but the items ids don't? My understanding was that the entry data constraints are: user ids should be unique long numbers, the item ids strings, ratings floats, so this made me think the original user ids can be reused, but item ids have to be generated. @srowen wrote: Something funny is going on here and it's probably subtle but simple, like an issue with how the data is read. Your comment about the IDs kind of suggests that the data files aren't being read as intended, so maybe all of these IDs are being treated quite differently as if they are unrelated. I've redone the Snappy install too, just in case I missed something the first time. Was thinking perhaps the compression is done with a version and decompression with a different version, hence the "data read" issue, so is snappy a dependency of Oryx and perhaps I need to rebuild it with this version I have installed in Hadoop? @srowen wrote: Right, X and Y are deleted after. It may be hard to view them before that happens. Is it possible to send me a link to the data privately, and your config? I can take a look locally. I have changed Oryx's source code so that it does not wipe out X and Y even when the matrix is subpar (commented out some lines in ALSDistributedGenerationRunner). I have uploaded the entry data and run results here: https://drive.google.com/folderview?id=0Bwd5INm6b7z4MENMcWtmQkNHRHM&usp=sharing . I am not concerned about privacy issues as the data is already anonymized, those ids don't really mean anything. I have created 2 folders, one with the in memory computation, the other for Hadoop, both computations for the same dataset. A few questions: - my user ids are Int64, so 64 bit signed integers. Could this cause problems? My next on the list is to rearrange them and start from 1. - the results I have included are for a test fraction of 0.25, so the output files will differ a lot due to random splitting (I imagine). Would it be easier for you if I run the computations without test fractions? - would it be even easier if I run the computations with RandomUtils.useTestSeed()? And I'll have to instruct the reducers to do this too? Thank you again for willing to look at this!

christinan · ‎11-28-2014

@srowen wrote: Although the result can vary a bit randomly from run to run, and it's possible you're on the border of insufficient rank, it sounds like this happens consistently? This happens consistently, yes. With those particular parameters, I always get ~0.11 in memory, and always between 0.006 and 0.009 with Hadoop. The config files are the same, I just commented out the local-computation and local-data lines. The dataset is quite big, the file itself has 84MB and 3.6 million lines. Also, for the in memory computations I wrote a tool to automate the search for factors, lambdas and alphas so I have quite a lot of runs so far, and just one performed as bad as this ones with Hadoop. And never for these parameters. @srowen wrote: Are there any errors from the Hadoop workers? do X/ and Y/ contain data? It sounds like the process has stopped too early. I suppose double-check that you do have the same data on HDFS. The config is otherwise the same? I have checked the computation layer log - in the console where I launched it, and the Hadoop job log. There were no errors anywhere. I do have a warning in the console, unable to find hadoop native libraries, using in built Java classes (I've Googled for a fix and will attend to that at some point). X and Y do not contain any data, as they get deleted, according to the computation log: Fri Nov 28 15:43:09 GMT 2014 INFO Loading X and Y to test whether they have sufficient rank Fri Nov 28 15:43:14 GMT 2014 INFO Matrix is not yet proved to be non-singular, continuing to load... Fri Nov 28 15:43:14 GMT 2014 WARNING X or Y does not have sufficient rank; deleting this model and its results Apart from missing these two folders, the rest of the artifacts get generated. I have compared the known items file from an in memory run with a Hadoop run, the in memory has around 2000 extra lines and, at a first glance, not so many negative user ids. But the user ids get generated each time by the algorithm (meaning internally these new numbers replace my user ids), so I should not expect them to be the same, is that correct?

christinan · ‎11-28-2014

Thanks for the help, Sean, I have opened a new thread for the MAP problem.

christinan · ‎11-28-2014

Hi I have been running Oryx ALS with the same entry dataset, both with local computation and with Hadoop. In memory produces MAP around 0.11, and converges after more than 25 iterations. Ran this about 20 times. With Hadoop, same dataset, same parameters, the algorithm converges at iteration 2 and MAP is 0.00x (ran it 3 times and wiped out the previous computations). With Hadoop computations I get this message: Fri Nov 28 15:43:09 GMT 2014 INFO Loading X and Y to test whether they have sufficient rank Fri Nov 28 15:43:14 GMT 2014 INFO Matrix is not yet proved to be non-singular, continuing to load... Fri Nov 28 15:43:14 GMT 2014 WARNING X or Y does not have sufficient rank; deleting this model and its results Any hints, please? Thank you.

christinan · ‎11-28-2014

Thanks again. @srowen wrote: First you need to figure out where your Hadoop config files are -- core-site.xml, etc. If you unpacked things in /usr/local/hadoop, then it's almost surely /usr/local/hadoop/conf. You have "etc" in your path but shouldn't, and that's the actual problem. I thought that the conf dir went away a few versions of Hadoop ago, all the configuration files are now in (/usr/local/hadoop/)etc/hadoop. This is why I originally thought that my Oryx version is assuming I have an old Hadoop, because it was looking for a folder called conf. But things are working now with the environment variables I have set, so that one is solved. @srowen wrote: Have you installed Snappy? you will need Snappy. I don't know if plain vanilla Apache Hadoop is able to configure and install it for you, although it's part of Hadoop. It's much easier to use a distribution, but your second problem appears to be down to not having Snappy set up. I have now installed Snappy and things sort of work, I managed to run a generation to completion. In case someone runs into problems building and installing Snappy, please see the end of this message. Now I have a different problem: the computations converge at iteration 2 and the MAP is abysmal, 0.006. Plus: Fri Nov 28 14:46:51 GMT 2014 INFO Loading X and Y to test whether they have sufficient rank Fri Nov 28 14:46:55 GMT 2014 INFO Matrix is not yet proved to be non-singular, continuing to load... Fri Nov 28 14:46:55 GMT 2014 WARNING X or Y does not have sufficient rank; deleting this model and its results I ran the process twice (deleted the previous results), and both were like this (MAP was slightly different the first time, but stil like 0.00x). Given this a random process, every now and again I expect things to be picked out badly. But, with these particular parameters, I ran about 20 in memory simulations and the MAP was always above 0.11. Convergence happens between 25 and 60 iterations. With other lambdas, factors and alphas, I do get the occasional convergence at 2 and the tiny MAP, but never twice in a row. Should I open a new thread for this? How to install Snappy Make sure these 3 packages are installed first: sudo apt-get install build-essential (needed for snappy) sudo apt-get install autoconf (needed for snappy-hadoop) sudo apt-get install libtool (needed for snappy-hadoop; without it you get the m4_pattern_allow error) Then follow this: https://github.com/electrum/hadoop-snappy . How to build and install snappy is explained in the file called "INSTALL".

christinan · ‎11-27-2014

Tracked down some logs: 2014-11-27 17:31:19,885 ERROR [IPC Server handler 6 on 57154] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_1417106947444_0015_m_000000_0 - exited : java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded. at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:69) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) at org.apache.hadoop.mapred.IFile$Writer.<init>(IFile.java:115) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 2014-11-27 17:31:19,885 INFO [IPC Server handler 6 on 57154] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Diagnostics report from attempt_1417106947444_0015_m_000000_0: Error: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded. at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:69) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) at org.apache.hadoop.mapred.IFile$Writer.<init>(IFile.java:115) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 2014-11-27 17:31:19,886 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1417106947444_0015_m_000000_0: Error: java.lang.RuntimeException: native snappy library not available: SnappyCompressor has not been loaded. at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:69) at org.apache.hadoop.io.compress.SnappyCodec.getCompressorType(SnappyCodec.java:132) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:148) at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:163) at org.apache.hadoop.mapred.IFile$Writer.<init>(IFile.java:115) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1583) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:700) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:770) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163) 2014-11-27 17:31:19,887 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417106947444_0015_m_000000_0 TaskAttempt Transitioned from RUNNING to FAIL_CONTAINER_CLEANUP 2014-11-27 17:31:19,887 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: Processing the event EventType: CONTAINER_REMOTE_CLEANUP for container container_1417106947444_0015_01_000002 taskAttempt attempt_1417106947444_0015_m_000000_0 2014-11-27 17:31:19,887 INFO [ContainerLauncher #1] org.apache.hadoop.mapreduce.v2.app.launcher.ContainerLauncherImpl: KILLING attempt_1417106947444_0015_m_000000_0 2014-11-27 17:31:19,897 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417106947444_0015_m_000000_0 TaskAttempt Transitioned from FAIL_CONTAINER_CLEANUP to FAIL_TASK_CLEANUP 2014-11-27 17:31:19,898 INFO [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.v2.app.commit.CommitterEventHandler: Processing the event EventType: TASK_ABORT 2014-11-27 17:31:19,905 WARN [CommitterEvent Processor #1] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://localhost:9000/tmp/crunch-839704455/p1/output/_temporary/1/_temporary/attempt_1417106947444_0015_m_000000_0 2014-11-27 17:31:19,906 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417106947444_0015_m_000000_0 TaskAttempt Transitioned from FAIL_TASK_CLEANUP to FAILED 2014-11-27 17:31:19,910 INFO [Thread-51] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: 1 failures on node christina-Precision-T1700 2014-11-27 17:31:19,911 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: attempt_1417106947444_0015_m_000000_1 TaskAttempt Transitioned from NEW to UNASSIGNED 2014-11-27 17:31:19,912 INFO [Thread-51] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Added attempt_1417106947444_0015_m_000000_1 to list of failed maps 2014-11-27 17:31:20,130 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Before Scheduling: PendingReds:11 ScheduledMaps:1 ScheduledReds:0 AssignedMaps:1 AssignedReds:0 CompletedMaps:0 CompletedReds:0 ContAlloc:1 ContRel:0 HostLocal:0 RackLocal:0 2014-11-27 17:31:20,133 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerRequestor: getResources() for application_1417106947444_0015: ask=1 release= 0 newContainers=0 finishedContainers=0 resourcelimit=<memory:6144, vCores:-1> knownNMs=1 2014-11-27 17:31:20,133 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Recalculating schedule, headroom=6144 2014-11-27 17:31:20,133 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Reduce slow start threshold not met. completedMapsForReduceSlowstart 1 2014-11-27 17:31:21,147 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Received completed container container_1417106947444_0015_01_000002 2014-11-27 17:31:21,148 INFO [RMCommunicator Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator: Got allocated containers 1 2014-11-27 17:31:21,148 INFO [AsyncDispatcher event handler] org.apache.hadoop.mapreduce.v2.app.job.impl.TaskAttemptImpl: Diagnostics report from attempt_1417106947444_0015_m_000000_0: Container killed by the ApplicationMaster. Container killed on request. Exit code is 143 Container exited with a non-zero exit code 143

christinan · ‎11-27-2014

Hi Thanks for coming back. As you correctly suspected, no, it was not set. I have tried the followings: - modify hadoop-env.sh, where I hardcoded HADOOP_CONF_DIR to /usr/local/hadoop/etc/hadoop. This made no difference. - modify ~.bashrc to include export HADOOP_CONF_DIR=/usr/local/hadoop/etc/hadoop (since the -env.sh was not adding that /etc/hadoop termination anymore). This again made no difference, not even after reboot, and printenv prooved my values were ignored for some reason (... surely user error, this works for the rest of the people on the Internet...). - modified etc/environment and set these: HADOOP_INSTALL="/usr/local/hadoop" HADOOP_CONF_DIR="/usr/local/hadoop/etc/hadoop" HADOOP_MAPRED_HOME="/usr/local/hadoop" HADOOP_COMMON_HOME="/usr/local/hadoop" HADOOP_HDFS_HOME="/usr/local/hadoop" YARN_HOME="/usr/local/hadoop" Finally I was able to start the computation layer without getting the conf dir error. Next I tried 2 things 1) Feeding the input file through the serving layer Despite upload being successful, things halted with this: Thu Nov 27 17:06:05 GMT 2014 INFO Completed MergeIDMappingStep in 27s Thu Nov 27 17:06:05 GMT 2014 WARNING Unexpected exception while running step com.cloudera.oryx.computation.common.JobException: Oryx-/home/christina/IdeaProjects/oryx_hadoop_ingest-0-MergeIDMappingStep failed in state FAILED at com.cloudera.oryx.computation.common.JobStep.run(JobStep.java:200) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.cloudera.oryx.computation.common.ParallelStep$1.call(ParallelStep.java:85) at com.cloudera.oryx.computation.common.ParallelStep$1.call(ParallelStep.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) But, on the plus side, I am able to see the failed MAPREDUCE jobs in yarn (it retried it), so something must be working better. 2) Pushing the data in through Hadoop. I rerun commands 1 and 2 from the Commands section of my previous message, this time after starting the computation and serving layers. The file was picked up and the processing again died later on: Thu Nov 27 17:24:15 GMT 2014 INFO Completed SplitTestStep in 0s Thu Nov 27 17:24:15 GMT 2014 INFO Mapper memory: 1024 Thu Nov 27 17:24:15 GMT 2014 INFO Mappers have 787MB heap and can access 1024MB RAM Thu Nov 27 17:24:15 GMT 2014 INFO Set mapreduce.map.java.opts to '-Xmx787m -XX:+UseCompressedOops -XX:+UseParallelGC -XX:+UseParallelOldGC' Thu Nov 27 17:24:15 GMT 2014 INFO Reducer memory: 1024 Thu Nov 27 17:24:15 GMT 2014 INFO Reducers have 787MB heap and can access 1024MB RAM Thu Nov 27 17:24:15 GMT 2014 INFO Set mapreduce.reduce.java.opts to '-Xmx787m -XX:+UseCompressedOops -XX:+UseParallelGC -XX:+UseParallelOldGC' Thu Nov 27 17:24:15 GMT 2014 INFO Created pipeline configuration Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/usr/local/hadoop/etc/hadoop/core-site.xml, file:/usr/local/hadoop/etc/hadoop/hdfs-site.xml, file:/usr/local/hadoop/etc/hadoop/mapred-site.xml, file:/usr/local/hadoop/etc/hadoop/yarn-site.xml, file:/usr/local/hadoop/etc/hadoop/core-site.xml, file:/usr/local/hadoop/etc/hadoop/hdfs-site.xml, file:/usr/local/hadoop/etc/hadoop/mapred-site.xml, file:/usr/local/hadoop/etc/hadoop/yarn-site.xml Thu Nov 27 17:24:15 GMT 2014 INFO Will write output files to new path: hdfs://localhost:9000/home/christina/IdeaProjects/oryx_hadoop/00000/idMapping Thu Nov 27 17:24:15 GMT 2014 INFO Waiting for Oryx-/home/christina/IdeaProjects/oryx_hadoop-0-MergeIDMappingStep to complete Thu Nov 27 17:24:15 GMT 2014 INFO Connecting to ResourceManager at /0.0.0.0:8032 Thu Nov 27 17:24:15 GMT 2014 INFO Total input paths to process : 1 Thu Nov 27 17:24:15 GMT 2014 INFO DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 84003086 Thu Nov 27 17:24:15 GMT 2014 INFO number of splits:1 Thu Nov 27 17:24:15 GMT 2014 INFO Submitting tokens for job: job_1417106947444_0011 Thu Nov 27 17:24:15 GMT 2014 INFO Submitted application application_1417106947444_0011 Thu Nov 27 17:24:15 GMT 2014 INFO The url to track the job: http://christina-Precision-T1700:8088/proxy/application_1417106947444_0011/ Thu Nov 27 17:24:15 GMT 2014 INFO Running job "Oryx-/home/christina/IdeaProjects/oryx_hadoop-0-MergeIDMappingStep: Text(hdfs://localhost:9000/home/christina/IdeaProjects/or... ID=1 (1/1)" Thu Nov 27 17:24:15 GMT 2014 INFO Job status available at: http://christina-Precision-T1700:8088/proxy/application_1417106947444_0011/ 1 job failure(s) occurred: Oryx-/home/christina/IdeaProjects/oryx_hadoop-0-MergeIDMappingStep: Text(hdfs://localhost:9000/home/christina/IdeaProjects/or... ID=1 (1/1)(1): Job failed! Thu Nov 27 17:24:58 GMT 2014 INFO Completed MergeIDMappingStep in 43s Thu Nov 27 17:24:58 GMT 2014 WARNING Unexpected exception while running step com.cloudera.oryx.computation.common.JobException: Oryx-/home/christina/IdeaProjects/oryx_hadoop-0-MergeIDMappingStep failed in state FAILED at com.cloudera.oryx.computation.common.JobStep.run(JobStep.java:200) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.cloudera.oryx.computation.common.ParallelStep$1.call(ParallelStep.java:85) at com.cloudera.oryx.computation.common.ParallelStep$1.call(ParallelStep.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Thu Nov 27 17:24:58 GMT 2014 WARNING Unexpected error in execution com.cloudera.oryx.computation.common.JobException: Oryx-/home/christina/IdeaProjects/oryx_hadoop-0-MergeIDMappingStep failed in state FAILED at com.cloudera.oryx.computation.common.JobStep.run(JobStep.java:200) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at com.cloudera.oryx.computation.common.ParallelStep$1.call(ParallelStep.java:85) at com.cloudera.oryx.computation.common.ParallelStep$1.call(ParallelStep.java:81) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Any ideas now? I am looking for some logs with more information, perhaps it's some sort of out of memory exception somewhere?

Online	Offline
Last Visited	‎04-01-2015 10:01 AM

Member Since	‎11-27-2014 06:43 AM
Last Visited	‎04-01-2015 10:01 AM
Posts	32

Cloudera Community

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: Oryx ALS: Hadoop computation yields MAP 0.00x,...

Re: Unable to run Oryx with Hadoop, exception java...

Oryx ALS: Hadoop computation yields MAP 0.00x, but...

Re: Unable to run Oryx with Hadoop, exception java...

Re: Unable to run Oryx with Hadoop, exception java...

Re: Unable to run Oryx with Hadoop, exception java...