Created 07-28-2016 10:54 PM
Hi,
I am looking for a reference and demos to show Text & Data mining capabilities on our platform.
I am trying to answer one of the RFP questions.
Any help is highly appreciated.
Thanks,
Sujitha
Created 07-29-2016 03:10 AM
I've placed a few pyspark scripts on my github: https://github.com/zaratsian/pyspark. You can demo/show these projects by copying the note.json link into Zeppelin Hub Viewer.
When working with text / unstructured data, there are a few things to keep in mind:
This process will help you understand your text by (1) finding data-driven topics using the matrix reduction / clustering techniques or by (2) using the term-document matrix to predict an outcome (probability failure, likelihood to churn, etc.)
You may also want to check out Word2Vec (I have an example in my github).
Hope this helps!
Created 07-29-2016 03:10 AM
I've placed a few pyspark scripts on my github: https://github.com/zaratsian/pyspark. You can demo/show these projects by copying the note.json link into Zeppelin Hub Viewer.
When working with text / unstructured data, there are a few things to keep in mind:
This process will help you understand your text by (1) finding data-driven topics using the matrix reduction / clustering techniques or by (2) using the term-document matrix to predict an outcome (probability failure, likelihood to churn, etc.)
You may also want to check out Word2Vec (I have an example in my github).
Hope this helps!