# Community Articles

Find and share helpful community-sourced technical articles.

Labels (2)

Guru

# Predicting stock portfolio losses using Monte Carlo simulation in Spark

## How it works

The Monte Carlo method is one that uses repeated sampling to predict a result. As a real-world example, think about how you might predict where your friend is aiming while throwing a dart at a dart board. If you were following the Monte Carlo method, you'd ask your friend to throw a 100 darts with the same aim, and then you'd make a prediction based on the largest cluster of darts. To predict stock returns we are going to pick 1,000,000 previous trading dates at random and see what happened to on those dates. The end result is going to be some aggregation of those results.

We will download historical stock trading data from Yahoo Finance and store them into HDFS. Then we will create a table in Spark like the below and pick a million random dates from it.

 GS AAPL GE OIL 2015-01-05 -3.12% -2.81% -1.83% -6.06% 2015-01-06 -2.02% -0.01% -2.16% -4.27% 2015-01-07 +1.48% +1.40% +0.04% +1.91% 2015-01-08 +1.59% +3.83% +1.21% +1.07% Table 1: percent change per day by stock symbol

We combine the column values with the same proportions as your trading account. For example, if on Jan 5th 2015 you equaliy invested all of your money in GS, AAPL, GE, and OIL then you would have lost

```% loss on 2015-01-05 = -3.12*(1/4) - 2.81*(1/4) - 1.83*(1/4) - 6.06*(1/4)
```

At the end of a Monte Carlo simulation we have 1,000,000 values that represent the possible gains and losses. We sort the results and take the 5th percentile, 50th percentile, and 95th percentile to represent the worst-case, average case, and best case scenarios.

When you run the below, you'll see this in the output

```In a single day, this is what could happen to your stock holdings if you have \$1000 invested
\$       %
worst case     -33   -3.33%
most likely scenario      -1   -0.14%
best case      23    2.28%
```

The code on GitHub also has examples of:

1. How to use Java 8 Lambda Expressions
2. Executing Hive SQL with Spark RDD objects
3. Unit testing Spark code with hadoop-mini-clusters

## Detailed Step-by-step guide

Download the latest (2.4 as of this writing) HDP Sandbox here. Import it into VMware or VirtualBox, start the instance, and update the DNS entry on your host machine to point to the new instance’s IP. On Mac, edit /etc/hosts, on Windows, edit %systemroot%\system32\drivers\etc\ as administrator and add a line similar to the below:

```192.168.56.102  sandbox sandbox.hortonworks.com
```

```useradd guest
su - hdfs -c "hdfs dfs -mkdir  /user/guest; hdfs dfs -chown guest:hdfs /user/guest; "
yum install -y java-1.8.0-openjdk-devel.x86_64
#update-alternatives --install /usr/lib/jvm/java java_sdk /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-0.b14.el6_7.x86_64  100
cd /tmp
git clone https://github.com/vzlatkin/MonteCarloVarUsingRealData.git
```

### 3. Update list of stocks that you own

Update companies_list.txt with the list of companies that you own in your stock portfolio and either the portfolio weight (as %/100) or the dollar amount. You should be able to get this information from your broker's website (Fidelity, Scottrade, etc...). Take out any extra commas (,) if you are copying and pasting from the web. The provided sample looks like this:

```Symbol,Weight or dollar amount (must include \$)
GE,\$250
AAPL,\$250
GS,\$250
OIL,\$250
```

Execute:

```cd /tmp/MonteCarloVarUsingRealData/
# Saved to /tmp/stockData/
```

### 5. Run the MonteCarlo simulation

Execute:

```su - guest -c " /usr/hdp/current/spark-client/bin/spark-submit --class com.hortonworks.example.Main --master yarn-client  --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 --queue default   /tmp/MonteCarloVarUsingRealData/target/monte-carlo-var-1.0-SNAPSHOT.jar  hdfs:///tmp/stockData/companies_list.txt hdfs:///tmp/stockData/*.csv"
```

## Interpreting the Results

Below is the result of a sample portfolio that has \$1,000 invested equally between Apple, GE, Goldman Sachs, and an ETF that holds crude oil. It says that with 95% certainty, the most that the portfolio can go down in a single day is \$33. In addition, there is a 5% chance that the portfolio will gain \$23 in a single day. Most of the time, the portfolio will lose \$1 per day.

```In a single day, this is what could happen to your stock holdings if you have \$1000 invested
\$       %
worst case     -33   -3.33%
most likely scenario      -1   -0.14%
best case      23    2.28%
```
3,460 Views
Take a Tour of the Community
Community Browser
Don't have an account?