<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Oryx ALS: X and Y do not have sufficient rank in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5911#M18163</link>
    <description>Oops, fixed. Yes I'm using CDH5b1 too, so that's not a difference. Can you compile from HEAD to make sure we're synced up there? you may already be, just checking. I can make a binary too. Any logs would be of interest for sure. I suppose I would suggest trying again with clearly small values for features (like 10) and clearly small values for lambda (like 0.0001) to see if that at least works. I would expect a lower number of features might be appropriate given there are a smallish number of items. You might try the optimizer again with lower ranges for both. More features encourages overfitting and more lambda encourages underfitting, so they kind of counter-act. It's possible you find a better value when both are low.</description>
    <pubDate>Sat, 08 Feb 2014 23:08:23 GMT</pubDate>
    <dc:creator>srowen</dc:creator>
    <dc:date>2014-02-08T23:08:23Z</dc:date>
    <item>
      <title>Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5727#M18150</link>
      <description>&lt;P&gt;A little info on the system this is running on:&lt;/P&gt;&lt;P&gt;I'm running CDH5 Beta1 on RHEL6U5, using the parcel installation method. I've set $JAVA_HOME to the cloudera installed 1.7_25 version.&amp;nbsp; Oryx was downloaded from github and built from source, using the hadoop22 profile.&amp;nbsp; The data source for the ALS job is on HDFS, not local.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I have a dataset containing 3,766,950 observations in User,Product,Strength format, which I am trying to use with the Oryx ALS collaborative filtering algorithm.&amp;nbsp;&amp;nbsp; Roughly 67.37% of the observations have a weight of 1.&amp;nbsp; My problem is that when attempting to execute the ALS job, the results are that either X or Y does not have sufficient rank, and are thus deleted.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've attempted running the Myrrix ParameterOptimizer using the following command (3 steps, 50% sample):&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;java -Xmx4g -cp myrrix-serving-1.0.1.jar net.myrrix.online.eval.ParameterOptimizer data 3 .5 model.features=10:150 model.als.lambda=0.0001:1&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It recommended using {model.als.lambda=1, model.features=45}, which I then used in the configuration file.&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;The configuration file itself is very simple:&lt;/P&gt;&lt;PRE&gt;model=${als-model}
model.instance-dir=/Oryx/data
model.local-computation=false
model.local-data=false
model.features=45
model.lambda=1
serving-layer.api.port=8093
computation-layer.api.port=8094&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And the computation command:&lt;/P&gt;&lt;PRE&gt;java -Dconfig.file=als.conf -jar computation/target/oryx-computation-0.4.0-SNAPSHOT.jar&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;After 20m or so of processing, this is the final few lines of output:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;Thu Feb 06 12:49:08 EST 2014 INFO Loading X and Y to test whether they have sufficient rank
Thu Feb 06 12:49:24 EST 2014 INFO Matrix is not yet proved to be non-singular, continuing to load...
Thu Feb 06 12:49:24 EST 2014 WARNING X or Y does not have sufficient rank; deleting this model and its results
Thu Feb 06 12:49:24 EST 2014 INFO Deleting recursively: hdfs://nameservice1/Oryx/data/00000/X
Thu Feb 06 12:49:24 EST 2014 INFO Deleting recursively: hdfs://nameservice1/Oryx/data/00000/Y
Thu Feb 06 12:49:24 EST 2014 INFO Signaling completion of generation 0
Thu Feb 06 12:49:24 EST 2014 INFO Deleting recursively: hdfs://nameservice1/Oryx/data/00000/tmp
Thu Feb 06 12:49:24 EST 2014 INFO Dumping some stats on generation 0
Thu Feb 06 12:49:24 EST 2014 INFO Generation 0 complete&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Any ideas on why this isn't working with using the recommended Features count and Lambda?&amp;nbsp; The ALS audioscrobbler example works fine, and the data format is similar (though the strengths are considerably smaller on my dataset).&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks in advance,&lt;/P&gt;&lt;P&gt;James&lt;/P&gt;</description>
      <pubDate>Fri, 16 Sep 2022 15:39:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5727#M18150</guid>
      <dc:creator>JamesConner</dc:creator>
      <dc:date>2022-09-16T15:39:27Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5733#M18151</link>
      <description>&lt;P&gt;Hmm that does sound strange. lambda = 1 is on the high side, although it may have come out as the best value given the values tested in the optimizer and given the variation in random starting points, etc. There is some randomness on the other side too when it builds and tests the factorization.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;My first guess is: decrease lambda. You might re-run the optimizer and restrict it to at most 0.1. This isn't a great answer but think it may be the fastest path to something working.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Longer-term, this is going to be rewritten to fully integrate parameter search into model building, so it won't be this separate and maybe disagreeing process.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2014 18:34:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5733#M18151</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-02-06T18:34:27Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5741#M18152</link>
      <description>&lt;P&gt;Hi Sean,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I should have mentioned that I've tried a few variations, each resulting in the same error each time.&amp;nbsp; I've tried the following combinations so far, each with the same result as when I followed the recommended Feature/Lambda settings:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Features : Lambda&lt;/P&gt;&lt;P&gt;20 : 0.065&lt;/P&gt;&lt;P&gt;100 : 0.065&lt;/P&gt;&lt;P&gt;45 : 1&lt;/P&gt;&lt;P&gt;45 : 0.1&lt;/P&gt;&lt;P&gt;50 : 0.1&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;All of those combinations end with the following error:&lt;/P&gt;&lt;PRE&gt;Thu Feb 06 14:20:37 EST 2014 INFO Loading X and Y to test whether they have sufficient rank
Thu Feb 06 14:20:50 EST 2014 INFO Matrix is not yet proved to be non-singular, continuing to load...
Thu Feb 06 14:20:50 EST 2014 WARNING X or Y does not have sufficient rank; deleting this model and its results
Thu Feb 06 14:20:50 EST 2014 INFO Deleting recursively: hdfs://nameservice1/Oryx/data/00000/X
Thu Feb 06 14:20:50 EST 2014 INFO Deleting recursively: hdfs://nameservice1/Oryx/data/00000/Y
Thu Feb 06 14:20:50 EST 2014 INFO Signaling completion of generation 0
Thu Feb 06 14:20:50 EST 2014 INFO Deleting recursively: hdfs://nameservice1/Oryx/data/00000/tmp
Thu Feb 06 14:20:50 EST 2014 INFO Dumping some stats on generation 0
Thu Feb 06 14:20:50 EST 2014 INFO Generation 0 complete&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2014 19:33:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5741#M18152</guid>
      <dc:creator>JamesConner</dc:creator>
      <dc:date>2014-02-06T19:33:23Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5761#M18153</link>
      <description>&lt;P&gt;Hmm. How much data are we talking about? you're building the model on the same data you optimized from?&lt;/P&gt;&lt;P&gt;How many unique users and items?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The general remedy is fewer features and lower lambda, but it can't be right that the optimizer is fine with this while the model build isn't fine with any of those values. Something is not right here...&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2014 20:32:44 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5761#M18153</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-02-06T20:32:44Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5793#M18154</link>
      <description>&lt;P&gt;The model is indeed being built from the full dataset, while the optimization was performed against a 50% sample.&amp;nbsp; To get the sample, I downloaded the dataset from HDFS to the local filesystem, and performed a "head -n 1883475 data50percent.csv".&amp;nbsp; Then I ran the optimizer locally, not distributed.&amp;nbsp; Should I use the full dataset instead?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Dataset size is 125MB&lt;/P&gt;&lt;P&gt;Number of records 3,766,950&lt;/P&gt;&lt;P&gt;Unique users 608146&lt;/P&gt;&lt;P&gt;Unique items 1151&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2014 21:50:27 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5793#M18154</guid>
      <dc:creator>JamesConner</dc:creator>
      <dc:date>2014-02-06T21:50:27Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5817#M18155</link>
      <description>&lt;P&gt;I would use 100% of the data, yes, but I don't think that should make a big difference&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The number of items is low, but not that low. Is there any reason to think the items are very 'redundant'?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;This is strange enough that I think there may be a bug somewhere. Is this data you can share, in anonymized form, offline?&lt;/P&gt;&lt;P&gt;I am wondering whether the singularity threshold needs to be configurable.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I would still try turning down lambda / features to get it going, although I still am not seeing a reason why it should be necessary.&lt;/P&gt;</description>
      <pubDate>Thu, 06 Feb 2014 23:32:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5817#M18155</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-02-06T23:32:54Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5831#M18156</link>
      <description>&lt;P&gt;Some of the items are exceptionally popular, while a large number of the other items have very low values.&amp;nbsp; The weight is a simple count of the items per user within a timeframe.&amp;nbsp; So a userID/itemID combo should only be seen once, but some of those items are seen for a very large percentage of the userIDs.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've tried setting Features to 5, and Lambda to .01, which also failed.&amp;nbsp; I'll try setting Features to 3 and Lambda to .0001 and see if that has any effect.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'll verify with our Legal dept about sending the data over, but it shouldn't be an issue.&amp;nbsp; I know I have your card from when we met in London and NY Strata, but it's in my desk at work, and I'm working from home, so you might have to message me with your email address or data drop location.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Feb 2014 13:08:11 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5831#M18156</guid>
      <dc:creator>JamesConner</dc:creator>
      <dc:date>2014-02-07T13:08:11Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5861#M18157</link>
      <description>&lt;P&gt;I'm cleared to send the dataset, just need to know where it's going!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;James&lt;/P&gt;</description>
      <pubDate>Fri, 07 Feb 2014 19:18:45 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5861#M18157</guid>
      <dc:creator>JamesConner</dc:creator>
      <dc:date>2014-02-07T19:18:45Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5867#M18158</link>
      <description>&lt;P&gt;Thanks James, I'm sowen at cloudera. It sounds like something I will have to debug as it's either quite subtle or a bug in the program. I'll solve it this weekend.&lt;/P&gt;</description>
      <pubDate>Fri, 07 Feb 2014 20:09:30 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5867#M18158</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-02-07T20:09:30Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5899#M18159</link>
      <description>&lt;P&gt;Thanks James, I got the data.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;FWIW, it built successfully locally, which is at least good. That is not a solution, but might get you moving.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I ran the data on Hadoop (CDH5) and uncovered a different problem, which I fixed:&amp;nbsp;&lt;A target="_blank" href="https://github.com/cloudera/oryx/commit/fee977f6a682ba6a2e8c2e48275cb4dc5718c8b2"&gt;https://github.com/cloudera/oryx/commit/fee977f6a682ba6a2e8c2e48275cb4dc5718c8b2&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It is basically an artifact of having virtually all even IDs. That shouldn't be a problem of course.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It ran successfully after that. Is the data you showed me the same as what you're running, or a sample (or redacted)? Just want to rationalize why you didn't see the error I did and then see if that has any clue to why it works for me.&lt;/P&gt;</description>
      <pubDate>Sat, 08 Feb 2014 15:57:08 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5899#M18159</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-02-08T15:57:08Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5903#M18160</link>
      <description>&lt;P&gt;I'll try the local build on one of the datanodes, that shouldn't be a problem for what I'm testing.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It's the full dataset, but the original data was actually userID/prodDesc/weight ... I was informed by our security team that I could send the data if I changed the prodDesc to prodID, since it's pretty meaningless without lookup tables.&amp;nbsp; So the Item variable went from a string when I was testing it, to a numeric; perhaps that's why I didn't see the same error.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;So I'm wondering if the problem is only seen if the Item variable is a string ... easy way to test it would be to hash the prodID, which would give an alpha numeric string, similar in format to the original prodDesc.&lt;BR /&gt;&lt;BR /&gt;I can hash the data and re-upload it, or you can run this little bit of python:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;#!/usr/bin/python

import csv,hashlib,sys,os,string

INPUT_FILE = csv.reader(open("cloudera_data.csv","rb"), delimiter=",")
OUTPUT_FILE = csv.writer(open("output.csv","wb"), delimiter=",")

for data_lines in INPUT_FILE:
    data_lines[1] = string.upper(hashlib.sha1(string.strip(str(data_lines[1]),chars="\n")).hexdigest())
    OUTPUT_FILE.writerow(data_lines)

sys.exit(0)&lt;/PRE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sat, 08 Feb 2014 19:14:16 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5903#M18160</guid>
      <dc:creator>JamesConner</dc:creator>
      <dc:date>2014-02-08T19:14:16Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5905#M18161</link>
      <description>&lt;P&gt;Yes that explains why you didn't see the same initial problem. Well, good that was fixed anyhow.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Text vs numeric shouldn't matter at all. Underneath they are both hashed. Looks the amount of data and its nature are the same if it's just that IDs were hashed. I can't imagine collisions are an issue.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I tried converting these 1-1 to an ID that is alphanumeric, and it worked for me.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You are using CDH 4.x vs 5 right? could be a different, but still don't quite expect a problem would be of this form.&lt;/P&gt;&lt;P&gt;Anything else of interest in the logs? you're welcome to send me all of it.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;You're starting from scratch when you run the test ?&lt;/P&gt;</description>
      <pubDate>Sat, 08 Feb 2014 21:18:26 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5905#M18161</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-02-08T21:18:26Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5909#M18162</link>
      <description>&lt;P&gt;I'm using CDH5 Beta 1, with Oryx compiled against the hadoop22 profile.&amp;nbsp; Speaking of which, you may want to update the Build documentation on github, which states to use profile name "cdh5", but the pom.xml actually uses hadoop22 as the profile name.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'll try running the test again tonight and see how it works out.&amp;nbsp; If I see anything else, I'll send you the log output, but I'm hoping for the best!&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;And yes, every test is started from scratch, just in case!&lt;/P&gt;</description>
      <pubDate>Sat, 08 Feb 2014 22:59:14 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5909#M18162</guid>
      <dc:creator>JamesConner</dc:creator>
      <dc:date>2014-02-08T22:59:14Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5911#M18163</link>
      <description>Oops, fixed. Yes I'm using CDH5b1 too, so that's not a difference. Can you compile from HEAD to make sure we're synced up there? you may already be, just checking. I can make a binary too. Any logs would be of interest for sure. I suppose I would suggest trying again with clearly small values for features (like 10) and clearly small values for lambda (like 0.0001) to see if that at least works. I would expect a lower number of features might be appropriate given there are a smallish number of items. You might try the optimizer again with lower ranges for both. More features encourages overfitting and more lambda encourages underfitting, so they kind of counter-act. It's possible you find a better value when both are low.</description>
      <pubDate>Sat, 08 Feb 2014 23:08:23 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5911#M18163</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-02-08T23:08:23Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5935#M18164</link>
      <description>&lt;P&gt;I'm able to replicate this issue as well.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I've run through various combinations of lamba/feature pairs. No luck.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'm running the latest CDH4 binaries.&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Sean, would you like my data set?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2014 03:39:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5935#M18164</guid>
      <dc:creator>bearrito</dc:creator>
      <dc:date>2014-02-10T03:39:05Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5945#M18165</link>
      <description>It's "normal" for this result to happen if the parameters are way out of kilter for the data set. I suppose it tends to be easier for that to happen with small data. So whether it's reproducing a problem depends on the data. But if you think the params are quite reasonable for the data and you see this, yes please send it to me.</description>
      <pubDate>Mon, 10 Feb 2014 11:29:54 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5945#M18165</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-02-10T11:29:54Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5949#M18166</link>
      <description>&lt;P&gt;Hi Sean,&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Good news:&amp;nbsp; I recompiled and gave it a whirl giving it 10 Features and .0001 Lambda as a first pass.&amp;nbsp; Nothing abnormal or unusual jumps out at me in the output, so I believe the commits you made did the trick.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Generation 0 was successfully built, and it has passed the X/Y sufficient rank test.&amp;nbsp; At first glance, the recommendations seem valid, if slightly skewed for the most popular items (which is expected).&amp;nbsp; I obviously need to work on a rescorer to minimize the over-represented items.&amp;nbsp; Next thing on the list is to see if it can be fit better, so I obviously need to come up with an automated unit test.&lt;BR /&gt;&lt;BR /&gt;Have you built the Optimizer into the Oryx source code, by chance, or is it just in Myrrix?&lt;/P&gt;</description>
      <pubDate>Mon, 10 Feb 2014 16:01:05 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5949#M18166</guid>
      <dc:creator>JamesConner</dc:creator>
      <dc:date>2014-02-10T16:01:05Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5951#M18167</link>
      <description>That's good, although I am still not sure why it worked fine for me with quite different params. The transformation should not have done much. It could be that the singularity tolerance is too strict, but I doubt it. There's going to be a fairly big rewrite of the computation, to use Spark in some parts for example. As part of that I am going to build in evaluation to the pipeline itself, so that it's always tuning as it goes. It's not going to come out soon -- just in design phase -- but the idea is that this should not be something anyone has to do by hand. For practical purposes, I would just proceed with these params for now and return to the idea of optimization later. I am guessing (?) your real data set is different anyway and would require different params. Or for this data set you could use the local build.</description>
      <pubDate>Mon, 10 Feb 2014 16:50:47 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/5951#M18167</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-02-10T16:50:47Z</dc:date>
    </item>
    <item>
      <title>Re: Oryx ALS: X and Y do not have sufficient rank</title>
      <link>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/22425#M18168</link>
      <description>&lt;P&gt;One late reply here: this bug fix may be relevant to the original problem:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;A href="https://github.com/cloudera/oryx/issues/99" target="_blank"&gt;https://github.com/cloudera/oryx/issues/99&lt;/A&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;I'll put this out soon in 1.0.1&lt;/P&gt;</description>
      <pubDate>Mon, 08 Dec 2014 00:08:03 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/Oryx-ALS-X-and-Y-do-not-have-sufficient-rank/m-p/22425#M18168</guid>
      <dc:creator>srowen</dc:creator>
      <dc:date>2014-12-08T00:08:03Z</dc:date>
    </item>
  </channel>
</rss>

