Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Implementing Expectation Maximization and a Minimal Spanning tree clustering algorithm in hadoop

Implementing Expectation Maximization and a Minimal Spanning tree clustering algorithm in hadoop

New Contributor

Hi, I am fairly new to the hadoop, and I don't know what tool to use to implement EM and clustering algorithms in hadoop,

so far what I have studied I figured out that I have to use Apache Spark for clustering and Map Reduce to implement EM, but I am not sure, what I need to know is that how to implement any algorithm in both Sparc and Map reduce, I have downloaded the Sandbox, but now I don't know how to write my algorithms in these frameworks. Please advise.

6 REPLIES 6

Re: Implementing Expectation Maximization and a Minimal Spanning tree clustering algorithm in hadoop

@Sushant Bhargav

You mentioned that you are new to hadoop so please see this to start with http://hortonworks.com/tutorials/

Mahout can be used for Expectation Maximization https://mahout.apache.org/users/clustering/expectation-maximization.html

Spanning Tree Clustering

This is just starting point. Once you get familiar with hadoop, spark and other components then you can do more research on various solutions in hadoop space.

Re: Implementing Expectation Maximization and a Minimal Spanning tree clustering algorithm in hadoop

New Contributor

thanks mate ! I am looking into mahout.

Re: Implementing Expectation Maximization and a Minimal Spanning tree clustering algorithm in hadoop

New Contributor

Can you provide more context on why you want to implement EM and MST clustering?

What is your use case?

Re: Implementing Expectation Maximization and a Minimal Spanning tree clustering algorithm in hadoop

New Contributor

Yeah ! I want to implement those for my thesis, I have made this algorithm which is modified form of EM and MST.

Highlighted

Re: Implementing Expectation Maximization and a Minimal Spanning tree clustering algorithm in hadoop

New Contributor

@Ram Sriharsha

@Sushant Bhargav

I implemented Baum Welch HMM trainer on Mahout in raw Map Reduce API a few years ago. That's an Expectation Maximization algorithm. It can be adapted to Spark with some work easily. See here:

https://issues.apache.org/jira/secure/attachment/1...

Re: Implementing Expectation Maximization and a Minimal Spanning tree clustering algorithm in hadoop

Mentor

@Sushant Bhargav can you accept the best answer to close this thread?