Market Basket Analysis algorithm using Spark Mllib


Hi experts,

I've the following dataset (just a example):

Customer_ID Product_Desc
1 Jeans
1 T-Shirt
1 Food
2 Jeans
2 Food
2 Nightdress
2 T-Shirt
2 Hat
3 Jeans
3 Food
4 Food
4 Water
5 Water
5 Food
5 Beer

There exists any algorithm available that allows me to predictive Consumer Behavior like this: "When a customer buy a Jeans it also buys Food together"

The algorithms that I've found only calculate the most common products...not the association between them :( Anyone knows a good tutorial that shows me how can I predict the association that I said above?

The first step is to conclude this relationships:


Anyone have an Idea?

Many thanks!!!

Re: Market Basket Analysis algorithm using Spark Mllib

Rising Star

It sounds like you're looking for collaborative filtering, which does exist in spark.mllib: published a paper in 2003: " Recommendations: Item-to-Item Collaborative Filtering" which describes the algorithm in more detail. Quoting from the paper:

"the algorithm finds items similar to each of the user’s purchases and ratings, aggregates those items, and then recommends the most popular or correlated items".

Re: Market Basket Analysis algorithm using Spark Mllib


Alex Woolford many thanks for your help :) In your case how can plan this project as a machine learning project? What I'm seeing is that the algorithms that I've been seen only count the occurrences. Thanks!