Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

What's the best way to do Monte Carlo simulation on Hadoop

avatar
Rising Star

Monte Carlo and is one of many simulation types that execute a huge amount of repetitive tasks that use relatively little data. The "data" is usually little more than sets of parameters to a function that must be executed a zillion times. Often this is followed by some kind of summarizing process. Clearly a custom MR job can be written for this, but is there any kind of standard frameworks that HDP recommends, or a published set of best practices?

1 ACCEPTED SOLUTION

avatar
Master Mentor

@Peter Coates This was brought by couple of DS guys. We discussed using Spark link

View solution in original post

6 REPLIES 6

avatar
Master Mentor

@Peter Coates This was brought by couple of DS guys. We discussed using Spark link

avatar

To add to this, as a rule of thumb, Spark is the best choice when it comes to executing iterative algorithm. It helps that there is inbuilt ML Lib. I haven't seen anyone writing MR by hand anymore (except recently met one of the customers of our competitors because they were misled into believing 'hive is slow'.).

avatar
New Member
@bsaini

Iterative computations are best in Spark for large data sets, not for CPU bound processes which use a small data set repeatedly.

avatar
New Member

@Peter Coates

why do you need Spark if the data is very small and can fit on a single node? There are other excellent Monte Carlo simulation packages which can do this efficiently -- open source or otherwise. Even Excel has an add-in for this.

edit: If you need more horsepower for Monte Carlo simulations which one node can't provide, you can look at MPI. Mpich is pretty good: https://www.mpich.org/ There's even a Yarn adapter for Mpich: https://github.com/alibaba/mpich2-yarn

avatar
Master Mentor

@Peter Coates can you accept the best answer to close this thread?

avatar