Created 10-29-2014 07:32 AM
Hi I ran Distributed Recommender using RecommendorJob class and provided movielens data as input. The data is having 943 users but the reocmmendations come out for only some where around 500+ users. Am i doing mistake any where, i provided the CSV file with item id, user id and preference value input.
Created 10-29-2014 09:17 AM
By default the distributed implementations will prune infrequent items and low similarity to scale up. With a tiny data set, this can mean some items are removed entirely, or become un-recommendable. While you can change this behavior, it is not useful to run the Hadoop-based job on such a small data set.
Created 10-29-2014 09:17 AM
By default the distributed implementations will prune infrequent items and low similarity to scale up. With a tiny data set, this can mean some items are removed entirely, or become un-recommendable. While you can change this behavior, it is not useful to run the Hadoop-based job on such a small data set.
Created 10-29-2014 10:15 AM
Oh is it so.. But, How can we alter this behaviour to get recommendations for all users.?
Created 10-29-2014 11:31 AM
The dataset is having 100k records, but they correspond to 943 users and 1682. It will create vectors from the data and performs the similarity measure using them. Yes, its minue considering how much the distributed programming can scale upto, but interested to know how to alter the behaviour.