About vzlatkin

richard_d_corfi · ‎07-18-2017

We have a similar problem. A MergeContent processor giving out 'is not the most recent version of thisFlowFile within this session' and 'is not known in this session' errors. Also we have the 'phantom' queue that you describe. A large queue that the processor does not process. But when we restart NiFi the queue drops the zero.

vzlatkin · ‎05-30-2016

Predicting stock portfolio losses using Monte Carlo simulation in Spark Summary Have you ever asked yourself: what is the most money my stock holdings could lose in a single day? If you own stock through a 401k, a personal trading account, or employer provided stock options then you should absolutely ask yourself this question. Now think about how to answer it. Your first guess maybe to pick a random number, say 20%, and assume that is the worst case scenario. While simple, this is likely to be wildly inaccurate and certainly doesn’t take into account the positive impacts of a diversified portfolio. Surprisingly, a good estimate is hard to calculate. Luckily, financial institutions have to do this for their stock portfolios (called Value at Risk (VaR)), and we can apply their methods to individual portfolios. In this article we will run a Monte Carlo simulation using real trading data to try to quantify what can happen to your portfolio. You should now go to your broker website (Fidelity, E*Trade, etc...) and get a list of stocks that you own and the % that each holding represents of the total portfolio. How it works The Monte Carlo method is one that uses repeated sampling to predict a result. As a real-world example, think about how you might predict where your friend is aiming while throwing a dart at a dart board. If you were following the Monte Carlo method, you'd ask your friend to throw a 100 darts with the same aim, and then you'd make a prediction based on the largest cluster of darts. To predict stock returns we are going to pick 1,000,000 previous trading dates at random and see what happened to on those dates. The end result is going to be some aggregation of those results. We will download historical stock trading data from Yahoo Finance and store them into HDFS. Then we will create a table in Spark like the below and pick a million random dates from it. GS AAPL GE OIL 2015-01-05 -3.12% -2.81% -1.83% -6.06% 2015-01-06 -2.02% -0.01% -2.16% -4.27% 2015-01-07 +1.48% +1.40% +0.04% +1.91% 2015-01-08 +1.59% +3.83% +1.21% +1.07% Table 1: percent change per day by stock symbol We combine the column values with the same proportions as your trading account. For example, if on Jan 5th 2015 you equaliy invested all of your money in GS, AAPL, GE, and OIL then you would have lost % loss on 2015-01-05 = -3.12*(1/4) - 2.81*(1/4) - 1.83*(1/4) - 6.06*(1/4) At the end of a Monte Carlo simulation we have 1,000,000 values that represent the possible gains and losses. We sort the results and take the 5th percentile, 50th percentile, and 95th percentile to represent the worst-case, average case, and best case scenarios. When you run the below, you'll see this in the output In a single day, this is what could happen to your stock holdings if you have $1000 invested $ % worst case -33 -3.33% most likely scenario -1 -0.14% best case 23 2.28% The code on GitHub also has examples of: How to use Java 8 Lambda Expressions Executing Hive SQL with Spark RDD objects Unit testing Spark code with hadoop-mini-clusters Detailed Step-by-step guide 1. Download and install the HDP Sandbox Download the latest (2.4 as of this writing) HDP Sandbox here. Import it into VMware or VirtualBox, start the instance, and update the DNS entry on your host machine to point to the new instance’s IP. On Mac, edit /etc/hosts, on Windows, edit %systemroot%\system32\drivers\etc\ as administrator and add a line similar to the below: 192.168.56.102 sandbox sandbox.hortonworks.com 2. Download code and prerequisites Log into the Sandbox and execute: useradd guest su - hdfs -c "hdfs dfs -mkdir /user/guest; hdfs dfs -chown guest:hdfs /user/guest; " yum install -y java-1.8.0-openjdk-devel.x86_64 #update-alternatives --install /usr/lib/jvm/java java_sdk /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.91-0.b14.el6_7.x86_64 100 cd /tmp git clone https://github.com/vzlatkin/MonteCarloVarUsingRealData.git 3. Update list of stocks that you own Update companies_list.txt with the list of companies that you own in your stock portfolio and either the portfolio weight (as %/100) or the dollar amount. You should be able to get this information from your broker's website (Fidelity, Scottrade, etc...). Take out any extra commas (,) if you are copying and pasting from the web. The provided sample looks like this: Symbol,Weight or dollar amount (must include $) GE,$250 AAPL,$250 GS,$250 OIL,$250 4. Download historical trading data for the stocks you own Execute: cd /tmp/MonteCarloVarUsingRealData/ /bin/bash downloadHistoricalData.sh # Downloading historical data for GE # Downloading historical data for AAPL # Downloading historical data for GS # Downloading historical data for OIL # Saved to /tmp/stockData/ 5. Run the MonteCarlo simulation Execute: su - guest -c " /usr/hdp/current/spark-client/bin/spark-submit --class com.hortonworks.example.Main --master yarn-client --num-executors 3 --driver-memory 512m --executor-memory 512m --executor-cores 1 --queue default /tmp/MonteCarloVarUsingRealData/target/monte-carlo-var-1.0-SNAPSHOT.jar hdfs:///tmp/stockData/companies_list.txt hdfs:///tmp/stockData/*.csv" Interpreting the Results Below is the result of a sample portfolio that has $1,000 invested equally between Apple, GE, Goldman Sachs, and an ETF that holds crude oil. It says that with 95% certainty, the most that the portfolio can go down in a single day is $33. In addition, there is a 5% chance that the portfolio will gain $23 in a single day. Most of the time, the portfolio will lose $1 per day. In a single day, this is what could happen to your stock holdings if you have $1000 invested $ % worst case -33 -3.33% most likely scenario -1 -0.14% best case 23 2.28%

prabirkumardhar · ‎01-18-2018

Can you please modify the script(enable-ssl.sh) for using the configs.py and re-post?

TimothySpann · ‎12-06-2017

https://community.hortonworks.com/articles/149910/handling-hl7-records-part-1-hl7-ingest.html https://community.hortonworks.com/articles/149891/handling-hl7-records-and-storing-in-apache-hive-fo.html https://community.hortonworks.com/articles/149982/hl7-ingest-part-4-streaming-analytics-manager-and.html https://community.hortonworks.com/articles/150026/hl7-processing-part-3-apache-zeppelin-sql-bi-and-a.html Attribute Name Cleaner (Needed for messy C-CDA and HL7 attribute names) https://github.com/tspannhw/nifi-attributecleaner-processor

fluxdude · ‎09-08-2017

Oh Jonas that is excellent, thanks very much for the link, I've starred your repo and am going to put links to it on my PyTools and Tools repos which have a whole selection of related Hadoop tools as I think people would be interested in that.

pminovic · ‎02-07-2016

Since Oozie is stopped you can use ij, Derby interactive sql tool.

vzlatkin · ‎03-06-2016

Thanks. I'll watch this JIRA for progress: https://issues.apache.org/jira/browse/HIVE-10924

vaibhav44 · ‎09-09-2017

@Jonas Straub Thanks for the article. Is there any way to figure out weather service check has passed or failed from API output(and not from Ambari GUI)?I'm getting the below output but not sure how to interpret the request output. API o/p : { "href" : "http://<host ip>:8080/api/v1/clusters/DEMO/requests/11", "Requests" : { "id" : 11, "status" : "Accepted" }

JoeWitt · ‎01-21-2016

Thanks for reporting it and for providing the stack traces. Very helpful. I've filed an Apache NiFi JIRA for it https://issues.apache.org/jira/browse/NIFI-1417

Karuncs · ‎02-13-2017

Having 4 environments including development, testing, pre-production/staging and production in a Big company would be good for best practices because in staging we can make sure that all are working properly. Of course the dev, testing and staging environments are smaller than planned production. For instance, if I take 2 nodes in dev, testing and staging then we can have a almost 8 nodes in production and again it's always depends on replication, traffic, and other relevant facts. Thanks!

Online	Offline
Last Visited	‎04-23-2020 07:35 PM

Member Since	‎09-29-2015 09:15 PM
Last Visited	‎04-23-2020 07:35 PM
Posts	67
Kudos received	113

Cloudera Community

Re: how to use path globs to copy files from local...

Re: Falcon tutorial - rawEmailIngestProcess shell-...

Re: How to bind host for oozie service in a Multih...

Re: Falcon replication and mirroring between two K...

Re: Oozie SSH Action - StrictHostKeyChecking=no

Re: NiFi error "not the most recent version of thi...

Predicting stock portfolio losses using Monte Carl...

Re: Quickly enable SSL encryption for Hadoop compo...

Re: Visualize patients' complaints to their doctor...

Re: How to identify what is consuming space in HDF...

Re: How to export data from Oozie Derby database?

Re: How to update Hive row with JOIN

Re: Is there a way to execute Ambari service check...

Re: Unable to start or stop a process in NiFi

Re: Best Practice for Dev, QA and Production for a...