Just want to know if and how can someone code other algorithms out there to suit their use case. Any information will be valuable and appreciated.
You can although it's not really designed to help you implement your own algorithms.
Really the project's history is this: it was a rapid combination of two stand-alone projects (Myrrix, Cloudera ML) which were complete end-to-end applications. A third new pipeline for RDF was added. They are packaged in a nice, consistent way, but the project as of 1.x is really 3 self-contained implementations under one roof rather than a framework for building your own. That is, it's much more app than library or framework.
You can always hack and change or modify the existing implementations. It's not hard, but it means forking. The code base will not undergo any major changes from here though, so it's stable.
A much broader redesign began several months ago as "Oryx 2": https://github.com/OryxProject/oryx
This will be much more a simple platform for implementing lambda architectures, ML pipelines on Spark and Spark Streaming.
Thanks for the information. But lets say that I want to code new algorithms from scratch and add to the existing set of algorithms, can I do this? I do not care if it takes a lot of effort to code them. Also, how do you suggest I start.
If you just want to implement a new distributed algorithm, I would build directly on Crunch or Spark. There's nothing in the project that really helps suport a new algorithm.
The pieces you might lift and reuse are the parts that manage repeatedly running a generation of data and making a model, and reading that in another process, and creating a HTTP REST-based API. Again it's less of using a library, than cloning and modifying, but yes it can be done.
Really Oryx2 is far more about providing a clean spot to drop in your algorithm or an existing one, your update, your serving endpoints, and managing the rest. But as I say it's very early for that at this point.
Thanks a lot for all the information. I get the point. Looking forward to code algos. in Spark. Again, I appreciate all the help.