Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

What's the best way to do Monte Carlo simulation on Hadoop

Solved Go to solution

What's the best way to do Monte Carlo simulation on Hadoop

New Contributor

Monte Carlo and is one of many simulation types that execute a huge amount of repetitive tasks that use relatively little data. The "data" is usually little more than sets of parameters to a function that must be executed a zillion times. Often this is followed by some kind of summarizing process. Clearly a custom MR job can be written for this, but is there any kind of standard frameworks that HDP recommends, or a published set of best practices?

1 ACCEPTED SOLUTION

Accepted Solutions

Re: What's the best way to do Monte Carlo simulation on Hadoop

@Peter Coates This was brought by couple of DS guys. We discussed using Spark link

6 REPLIES 6

Re: What's the best way to do Monte Carlo simulation on Hadoop

@Peter Coates This was brought by couple of DS guys. We discussed using Spark link

Re: What's the best way to do Monte Carlo simulation on Hadoop

To add to this, as a rule of thumb, Spark is the best choice when it comes to executing iterative algorithm. It helps that there is inbuilt ML Lib. I haven't seen anyone writing MR by hand anymore (except recently met one of the customers of our competitors because they were misled into believing 'hive is slow'.).

Highlighted

Re: What's the best way to do Monte Carlo simulation on Hadoop

New Contributor
@bsaini

Iterative computations are best in Spark for large data sets, not for CPU bound processes which use a small data set repeatedly.

Re: What's the best way to do Monte Carlo simulation on Hadoop

New Contributor

@Peter Coates

why do you need Spark if the data is very small and can fit on a single node? There are other excellent Monte Carlo simulation packages which can do this efficiently -- open source or otherwise. Even Excel has an add-in for this.

edit: If you need more horsepower for Monte Carlo simulations which one node can't provide, you can look at MPI. Mpich is pretty good: https://www.mpich.org/ There's even a Yarn adapter for Mpich: https://github.com/alibaba/mpich2-yarn

Re: What's the best way to do Monte Carlo simulation on Hadoop

Mentor

@Peter Coates can you accept the best answer to close this thread?

Re: What's the best way to do Monte Carlo simulation on Hadoop