- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Report Inappropriate Content
Created on 11-01-2018 06:46 PM - edited 09-16-2022 01:44 AM
Introduction
Knowing how and when to leverage cloud infrastructure and when to use on premises equipment is at the heart of today's mature enterprise data strategies. I decided to dive deeper into the subject, and, like I did with the Personality Detection data science and engineering platform I built, I wanted to use an example to illustrate how to approach how to architect a hybrid cloud infrastructure.
These types of endeavors can be quite theoretical (read hard to follow and potentially boring), so I decided to attach this hybrid cloud platform demonstration to something very concrete: my own health. I signed up for a marathon that will take place March 29, and I am starting to train November 5th. I plan to use my training, sleep, nutrition and general health data to evaluate and eventually predict my perceived level of fatigue that I coined under the term: "Beast Mode Quotient" or BMQ.
This article is an introduction to the architecture and data flows I will put in place. It will refer to sub articles that will be tutorials that anyone can follow to implement their hybrid cloud strategies.
Architecture overview
The figure below gives a highlight of my hybrid cloud platform:
As you can see, it is comprised of the following elements:
- A Data Fabric Layer (Data Plane Services) hosting the BMQ dashboard app, as well as Cloubreak in order to deploy ephemeral clusters
- One permanent cluster, hosting data ingestion from various health sources (e.g. Strava, MyFitnessPal, Fitbit). This ingestion will also execute the ML pipeline to predict level of fatigue
- Any number of ephemeral clusters executing the training and generation of the ML pipeline
- Enable Analytics & model training on the data stored in MySQL using Zeppelin notebooks & Spark, that would then feed back the BMQ application
- Enable custom application to consumer the data extracted and analyzed
Note: I realize that this architecture is showcasing a near-future state (Cloudbreak is not technically part of Data Plane Service yet). But this series of article is a long term project, so the architecture is set a few weeks into the future.
Data Flows
The platform will interact in two major ways, detailed in the flows below.
Data Flow 1: Refreshing Data
Data Flow 2: Generating ML Pipeline using ephemeral clusters
Metrics analyzed and predicted
From a high vantage point, the platform will collect, analyze and predict the following pieces of data. My goal is to log all these pieces of data trough various health and fitness services, create a BMQ predictive model that I will then apply to my upcoming training plan.
Activity Data
- Distance
- Duration
- Avg. Heart Rate
- Elevation
- Avg. Pace
Nutrition Data
- Calories in
- % Carbs
- % Proteins
- % Fats
Sleep Data
- Total sleep
- % REM
- % Light sleep
- % Deep sleep
- % Awake
Health Data
- Total Cal. Burned
- Avg. Heart Rate
- Steps
- Elevation
- Weight
Perceived Fatigue
- BMQ: Morning
- BMQ: Pre-Workout
- BMQ: Post-Workout
- BMQ: Evening
Implementation Tutorials
The implementation of this platform will be detailed in the upcoming following tutorial articles:
- Part 1: Create Nifi Flows to ingest API and flat files and load them onto MySQL
- Part 2: Create Cloudbreak blueprints to deploy data science ready ephemeral clusters
- Part 3: Use Spark and Zeppelin to train fatigue prediction models
- Part 4: Create an end-to-end application automating BMQ calculation and prediction