Support Questions

pcoates · ‎11-18-2015

Monte Carlo and is one of many simulation types that execute a huge amount of repetitive tasks that use relatively little data. The "data" is usually little more than sets of parameters to a function that must be executed a zillion times. Often this is followed by some kind of summarizing process. Clearly a custom MR job can be written for this, but is there any kind of standard frameworks that HDP recommends, or a published set of best practices?

nsabharwal · ‎11-18-2015

@Peter Coates This was brought by couple of DS guys. We discussed using Spark link

View solution in original post

nsabharwal · ‎11-18-2015

@Peter Coates This was brought by couple of DS guys. We discussed using Spark link

bsaini · ‎11-18-2015

To add to this, as a rule of thumb, Spark is the best choice when it comes to executing iterative algorithm. It helps that there is inbuilt ML Lib. I haven't seen anyone writing MR by hand anymore (except recently met one of the customers of our competitors because they were misled into believing 'hive is slow'.).

dkumar1 · ‎12-04-2015

@bsaini

Iterative computations are best in Spark for large data sets, not for CPU bound processes which use a small data set repeatedly.

dkumar1 · ‎12-04-2015

@Peter Coates

why do you need Spark if the data is very small and can fit on a single node? There are other excellent Monte Carlo simulation packages which can do this efficiently -- open source or otherwise. Even Excel has an add-in for this.

edit: If you need more horsepower for Monte Carlo simulations which one node can't provide, you can look at MPI. Mpich is pretty good: https://www.mpich.org/ There's even a Yarn adapter for Mpich: https://github.com/alibaba/mpich2-yarn

aervits · ‎02-02-2016

@Peter Coates can you accept the best answer to close this thread?

vzlatkin · ‎06-03-2016

Here is an example: https://community.hortonworks.com/articles/36321/predicting-stock-portfolio-losses-using-monte-carl....

Cloudera Community

Support Questions

What's the best way to do Monte Carlo simulation on Hadoop