Created on 11-01-201806:46 PM - edited 08-17-201905:56 AM
Knowing how and when to leverage cloud infrastructure and when to use on premises equipment is at the heart of today's mature enterprise data strategies. I decided to dive deeper into the subject, and, like I did with the
Personality Detection data science and engineering platform I built, I wanted to use an example to illustrate how to approach how to architect a hybrid cloud infrastructure.
These types of endeavors can be quite theoretical (read hard to follow and potentially boring), so I decided to attach this hybrid cloud platform demonstration to something very concrete: my own health. I signed up for a marathon that will take place March 29, and I am starting to train November 5th. I plan to use my training, sleep, nutrition and general health data to evaluate and eventually predict my perceived level of fatigue that I coined under the term: "Beast Mode Quotient" or BMQ.
This article is an introduction to the architecture and data flows I will put in place. It will refer to sub articles that will be tutorials that anyone can follow to implement their hybrid cloud strategies.
The figure below gives a highlight of my hybrid cloud platform:
As you can see, it is comprised of the following elements:
A Data Fabric Layer (Data Plane Services) hosting the BMQ dashboard app, as well as Cloubreak in order to deploy ephemeral clusters
One permanent cluster, hosting data ingestion from various health sources (e.g. Strava, MyFitnessPal, Fitbit). This ingestion will also execute the ML pipeline to predict level of fatigue
Any number of ephemeral clusters executing the training and generation of the ML pipeline
Enable Analytics & model training on the data stored in MySQL using Zeppelin notebooks & Spark, that would then feed back the BMQ application
Enable custom application to consumer the data extracted and analyzed
Note: I realize that this architecture is showcasing a near-future state (Cloudbreak is not technically part of Data Plane Service yet). But this series of article is a long term project, so the architecture is set a few weeks into the future.
The platform will interact in two major ways, detailed in the flows below.
Data Flow 1: Refreshing Data
Data Flow 2: Generating ML Pipeline using ephemeral clusters
Metrics analyzed and predicted
From a high vantage point, the platform will collect, analyze and predict the following pieces of data. My goal is to log all these pieces of data trough various health and fitness services, create a BMQ predictive model that I will then apply to my upcoming training plan.
Avg. Heart Rate
% Light sleep
% Deep sleep
Total Cal. Burned
Avg. Heart Rate
The implementation of this platform will be detailed in the upcoming following tutorial articles: