Hi, I am fairly new to the hadoop, and I don't know what tool to use to implement EM and clustering algorithms in hadoop,
so far what I have studied I figured out that I have to use Apache Spark for clustering and Map Reduce to implement EM, but I am not sure, what I need to know is that how to implement any algorithm in both Sparc and Map reduce, I have downloaded the Sandbox, but now I don't know how to write my algorithms in these frameworks. Please advise.
You mentioned that you are new to hadoop so please see this to start with http://hortonworks.com/tutorials/
Mahout can be used for Expectation Maximization https://mahout.apache.org/users/clustering/expectation-maximization.html
Can you provide more context on why you want to implement EM and MST clustering?
What is your use case?
Yeah ! I want to implement those for my thesis, I have made this algorithm which is modified form of EM and MST.
I implemented Baum Welch HMM trainer on Mahout in raw Map Reduce API a few years ago. That's an Expectation Maximization algorithm. It can be adapted to Spark with some work easily. See here: