Member since
02-23-2016
51
Posts
96
Kudos Received
4
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1427 | 05-25-2016 04:42 PM | |
2538 | 05-16-2016 01:09 PM | |
959 | 04-27-2016 05:40 PM | |
3995 | 02-26-2016 02:14 PM |
05-12-2016
05:19 PM
1 Kudo
Great feature @bbende . This is much easier to visualize now.
... View more
05-12-2016
05:12 PM
3 Kudos
If the case where NiFi is reading from 30 database tables in a single flow what is the best way to visually identify which processor is connecting to each database and table?
... View more
Labels:
- Labels:
-
Apache NiFi
05-11-2016
01:55 PM
7 Kudos
You can now visualize any Zeppelin notebook using Zeppelinhub viewer. https://www.zeppelinhub.com/viewer personal likes: 1. No need to sign up or register just paste a link 2. I've been posting my zeppelin notebooks to github but everyone that wants to visualize them or interact with them needs to download, move to environment, import into their instance of zeppelin. Not anymore just paste the link. 3. Less of a need to take screenshots and create a powerpoint just send the hyperlink examples: Stock Variance Notebook github - https://github.com/kirkhas/zeppelin-notebooks/blob/master/stock-variance/note.json vs Stock Variance Notebook zephub - https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2tpcmtoYXMvemVwcGVsaW4tbm90ZWJvb2tzL21hc3Rlci9zdG9jay12YXJpYW5jZS9ub3RlLmpzb24 Credit Card Fraud Transactions git - https://github.com/vakshorton/CreditCardTransactionMonitor/blob/master/Zeppelin/notebook/2BGDWYZV9/note.json vs https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL3Zha3Nob3J0b24vQ3JlZGl0Q2FyZFRyYW5zYWN0aW9uTW9uaXRvci9tYXN0ZXIvWmVwcGVsaW4vbm90ZWJvb2svMkJHRFdZWlY5L25vdGUuanNvbg
... View more
Labels:
05-11-2016
01:55 PM
1 Kudo
In addition to the HWX install guides online, this is a great best practices article for groups that want to consider some design options prior to install. http://hortonworks.com/blog/best-practices-in-hdfs-authorization-with-apache-ranger/
... View more
Labels:
05-11-2016
01:04 PM
1 Kudo
Repo Description Zeppelin Notebook Stock Variance Example check it out on zeppelinhubview
https://www.zeppelinhub.com/viewer/notebooks/aHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL2tpcmtoYXMvemVwcGVsaW4tbm90ZWJvb2tzL21hc3Rlci9zdG9jay12YXJpYW5jZS9ub3RlLmpzb24 Repo Info Github Repo URL https://github.com/kirkhas/zeppelin-notebooks Github account name kirkhas Repo name zeppelin-notebooks
... View more
Labels:
04-30-2016
03:05 AM
Vadim and I at hortonworks have a packaged demo of exactly what I described above on github. data streaming through NiFi into a spark model at this link https://github.com/vakshorton
... View more
04-30-2016
03:03 AM
1 Kudo
The more traditional approach in this situation is to use NiFi to read the incoming data and then add a NiFi processor to dump the data from the NiFi queue to either Storm or in your case SparkStreaming. Now you can build a Spark ML model test it and run it in Spark. You can push logic into NiFi but an ML model inside NiFi is overkill.
... View more
04-27-2016
06:18 PM
1 Kudo
In the latest version of Hortonworks sandbox 2.4 (you can download for free from hortonworks.com) zeppelin and spark run out of the box. Spark version is 1.6 and the toJSON method works, make sure you run it on a DataFrame not RDD val js = tradesRDD.toDF.toJSON
js.take(2)
output:
Array[String] =
Array({"trader":"Kirk","price":11.0,"qty":51,"vol":40000,"product":"goog","time":"2016-03-29
10:38:12.0"},
{"trader":"Kirk","price":0.0,"qty":66,"vol":40000,"product":"goog","time":"2016-03-29
10:56:12.0"})
... View more
04-27-2016
05:40 PM
2 Kudos
This is the common process many go through and many ways to skin the cat here. I prefer the below methodology. 1. Bring in the data with minimal transformation the "E" and "L". Depending on workload this could be sqoop for simple batch or NiFi for a more modern streaming approach with better control over flow, bi-direction and back pressure. 2. Decide on a transformation strategy and store a higher level or "enriched" data set typically in Hive or HBase. Now between Atlas and NiFi you should have some data lineage. Other formatting might take place here with native datatypes dates vs timestamps. Likely a partitioning strategy would take place here. Running a data cleansing strategy at this phase is also a good idea as well as computing feature vectors. 3. Use zeppelin + spark to analyze the data.
... View more
04-27-2016
03:07 PM
1 Kudo
After playing with the Spark 1.6 LinearRegression model I found it is very sensitive to the StepSize. What is the best practice around tuning this parameter? The mean squared error of the model I build varies greatly depending on this input. // Building the model
val numIterations = 30
val stepSize = 0.0001
val linearModel = LinearRegressionWithSGD.train(trainingDataRDD, numIterations, stepSize)
// Evaluate model on training examples and compute training error
val valuesAndPreds = trainingDataRDD .map { point =>
val prediction = linearModel.predict(point.features)
(point.label, prediction)
}
val MSE = valuesAndPreds.map{case(v, p) => math.pow((v - p), 2)}.mean()
println("training Mean Squared Error = " + NumberFormat.getInstance().format(MSE) )
... View more
Labels:
- Labels:
-
Apache Spark
-
Apache Zeppelin
- « Previous
- Next »