Community Articles

Find and share helpful community-sourced technical articles.
Celebrating as our community reaches 100,000 members! Thank you!


Knowing how and when to leverage cloud infrastructure and when to use on premises equipment is at the heart of today's mature enterprise data strategies. I decided to dive deeper into the subject, and, like I did with the Personality Detection data science and engineering platform I built, I wanted to use an example to illustrate how to approach how to architect a hybrid cloud infrastructure.

These types of endeavors can be quite theoretical (read hard to follow and potentially boring), so I decided to attach this hybrid cloud platform demonstration to something very concrete: my own health. I signed up for a marathon that will take place March 29, and I am starting to train November 5th. I plan to use my training, sleep, nutrition and general health data to evaluate and eventually predict my perceived level of fatigue that I coined under the term: "Beast Mode Quotient" or BMQ.

This article is an introduction to the architecture and data flows I will put in place. It will refer to sub articles that will be tutorials that anyone can follow to implement their hybrid cloud strategies.

Architecture overview

The figure below gives a highlight of my hybrid cloud platform:


As you can see, it is comprised of the following elements:

  • A Data Fabric Layer (Data Plane Services) hosting the BMQ dashboard app, as well as Cloubreak in order to deploy ephemeral clusters
  • One permanent cluster, hosting data ingestion from various health sources (e.g. Strava, MyFitnessPal, Fitbit). This ingestion will also execute the ML pipeline to predict level of fatigue
  • Any number of ephemeral clusters executing the training and generation of the ML pipeline
  • Enable Analytics & model training on the data stored in MySQL using Zeppelin notebooks & Spark, that would then feed back the BMQ application
  • Enable custom application to consumer the data extracted and analyzed

Note: I realize that this architecture is showcasing a near-future state (Cloudbreak is not technically part of Data Plane Service yet). But this series of article is a long term project, so the architecture is set a few weeks into the future.

Data Flows

The platform will interact in two major ways, detailed in the flows below.

Data Flow 1: Refreshing Data


Data Flow 2: Generating ML Pipeline using ephemeral clusters


Metrics analyzed and predicted

From a high vantage point, the platform will collect, analyze and predict the following pieces of data. My goal is to log all these pieces of data trough various health and fitness services, create a BMQ predictive model that I will then apply to my upcoming training plan.

Activity Data

  • Distance
  • Duration
  • Avg. Heart Rate
  • Elevation
  • Avg. Pace

Nutrition Data

  • Calories in
  • % Carbs
  • % Proteins
  • % Fats

Sleep Data

  • Total sleep
  • % REM
  • % Light sleep
  • % Deep sleep
  • % Awake

Health Data

  • Total Cal. Burned
  • Avg. Heart Rate
  • Steps
  • Elevation
  • Weight

Perceived Fatigue

  • BMQ: Morning
  • BMQ: Pre-Workout
  • BMQ: Post-Workout
  • BMQ: Evening

Implementation Tutorials

The implementation of this platform will be detailed in the upcoming following tutorial articles: