- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Distributed Recommender not giving recommendations for all users
Created 10-29-2014 07:32 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi I ran Distributed Recommender using RecommendorJob class and provided movielens data as input. The data is having 943 users but the reocmmendations come out for only some where around 500+ users. Am i doing mistake any where, i provided the CSV file with item id, user id and preference value input.
Created 10-29-2014 09:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By default the distributed implementations will prune infrequent items and low similarity to scale up. With a tiny data set, this can mean some items are removed entirely, or become un-recommendable. While you can change this behavior, it is not useful to run the Hadoop-based job on such a small data set.
Created 10-29-2014 09:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
By default the distributed implementations will prune infrequent items and low similarity to scale up. With a tiny data set, this can mean some items are removed entirely, or become un-recommendable. While you can change this behavior, it is not useful to run the Hadoop-based job on such a small data set.
Created 10-29-2014 10:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Oh is it so.. But, How can we alter this behaviour to get recommendations for all users.?
Created 10-29-2014 11:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The dataset is having 100k records, but they correspond to 943 users and 1682. It will create vectors from the data and performs the similarity measure using them. Yes, its minue considering how much the distributed programming can scale upto, but interested to know how to alter the behaviour.
