Code Repositories
Find and share code repositories
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Repo Description

This project contains publicly available clinical trials data and the code used for the analysis of this information. I used Spark (pySpark) for my analysis and Zeppelin as the code editor & visualization tool.

The purpose of this project is to provide an example of how to analyze both structured and unstructured data using Spark. This involves parsing and cleansing the raw data, text analytics (to find data-driven topics), and basic visualization techniques available within Zeppelin.

Repo Info
Github Repo URL https://github.com/zaratsian/pyspark/tree/master/clinical_analysis
Github account name zaratsian
Repo name clinical_analysis
97 Views
0 Kudos
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
1 of 1
Last update:
‎12-10-2016 01:19 AM
Updated by:
 
Contributors
Top Kudoed Authors