Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Market Basket Analysis algorithm using Spark Mllib

Highlighted

Market Basket Analysis algorithm using Spark Mllib

Explorer

Hi experts,

I've the following dataset (just a example):

Customer_ID Product_Desc
1 Jeans
1 T-Shirt
1 Food
2 Jeans
2 Food
2 Nightdress
2 T-Shirt
2 Hat
3 Jeans
3 Food
4 Food
4 Water
5 Water
5 Food
5 Beer

There exists any algorithm available that allows me to predictive Consumer Behavior like this: "When a customer buy a Jeans it also buys Food together"

The algorithms that I've found only calculate the most common products...not the association between them :( Anyone knows a good tutorial that shows me how can I predict the association that I said above?

The first step is to conclude this relationships:

Jeans-T-Shirt-Food
Jeans-Food-Nightdress-T-Shirt-Hat
Jeans-Food
Food-Water
Water-Food-Beer

Anyone have an Idea?

Many thanks!!!
2 REPLIES 2

Re: Market Basket Analysis algorithm using Spark Mllib

Rising Star

It sounds like you're looking for collaborative filtering, which does exist in spark.mllib: http://spark.apache.org/docs/1.6.2/mllib-collaborative-filtering.html

Amazon.com published a paper in 2003: "Amazon.com Recommendations: Item-to-Item Collaborative Filtering" which describes the algorithm in more detail. Quoting from the paper:

"the algorithm finds items similar to each of the user’s purchases and ratings, aggregates those items, and then recommends the most popular or correlated items".

Re: Market Basket Analysis algorithm using Spark Mllib

Explorer

Alex Woolford many thanks for your help :) In your case how can plan this project as a machine learning project? What I'm seeing is that the algorithms that I've been seen only count the occurrences. Thanks!