Community Articles

pvidal · ‎11-01-2018

Introduction

Knowing how and when to leverage cloud infrastructure and when to use on premises equipment is at the heart of today's mature enterprise data strategies. I decided to dive deeper into the subject, and, like I did with the Personality Detection data science and engineering platform I built, I wanted to use an example to illustrate how to approach how to architect a hybrid cloud infrastructure.

These types of endeavors can be quite theoretical (read hard to follow and potentially boring), so I decided to attach this hybrid cloud platform demonstration to something very concrete: my own health. I signed up for a marathon that will take place March 29, and I am starting to train November 5th. I plan to use my training, sleep, nutrition and general health data to evaluate and eventually predict my perceived level of fatigue that I coined under the term: "Beast Mode Quotient" or BMQ.

This article is an introduction to the architecture and data flows I will put in place. It will refer to sub articles that will be tutorials that anyone can follow to implement their hybrid cloud strategies.

Architecture overview

The figure below gives a highlight of my hybrid cloud platform:

As you can see, it is comprised of the following elements:

A Data Fabric Layer (Data Plane Services) hosting the BMQ dashboard app, as well as Cloubreak in order to deploy ephemeral clusters
One permanent cluster, hosting data ingestion from various health sources (e.g. Strava, MyFitnessPal, Fitbit). This ingestion will also execute the ML pipeline to predict level of fatigue
Any number of ephemeral clusters executing the training and generation of the ML pipeline
Enable Analytics & model training on the data stored in MySQL using Zeppelin notebooks & Spark, that would then feed back the BMQ application
Enable custom application to consumer the data extracted and analyzed

Note: I realize that this architecture is showcasing a near-future state (Cloudbreak is not technically part of Data Plane Service yet). But this series of article is a long term project, so the architecture is set a few weeks into the future.

Data Flows

The platform will interact in two major ways, detailed in the flows below.

Data Flow 1: Refreshing Data

Data Flow 2: Generating ML Pipeline using ephemeral clusters

Metrics analyzed and predicted

From a high vantage point, the platform will collect, analyze and predict the following pieces of data. My goal is to log all these pieces of data trough various health and fitness services, create a BMQ predictive model that I will then apply to my upcoming training plan.

Activity Data

Distance
Duration
Avg. Heart Rate
Elevation
Avg. Pace

Nutrition Data

Calories in
% Carbs
% Proteins
% Fats

Sleep Data

Total sleep
% REM
% Light sleep
% Deep sleep
% Awake

Health Data

Total Cal. Burned
Avg. Heart Rate
Steps
Elevation
Weight

Perceived Fatigue

BMQ: Morning
BMQ: Pre-Workout
BMQ: Post-Workout
BMQ: Evening

Implementation Tutorials

The implementation of this platform will be detailed in the upcoming following tutorial articles:

Cloudera Community

Community Articles

Beast Mode Quotient: Using Hybrid Cloud architecture to predict athletes levels of fatigue

Apache NiFi

Cloudera Data Science Workbench (CDSW)

Hortonworks Cloudbreak

Introduction

Architecture overview

Data Flows

Data Flow 1: Refreshing Data

Data Flow 2: Generating ML Pipeline using ephemeral clusters

Metrics analyzed and predicted

Activity Data

Nutrition Data

Sleep Data

Health Data

Perceived Fatigue

Implementation Tutorials

Beast Mode Quotient - Part 2: Create Cloudbreak bl...

KAFKA MIRRORING IN HYBRID CLOUD ENVIRONMENT

How Cloudera Data Platform excels at hybrid use ca...

LLAP - a one-page architecture overview

Predicting stock portfolio losses using Monte Carl...

Predicting Stock Portfolio Gains using Monte Carlo...

Time series oriented architecture using Apache Pho...

Validating Jet Engine Predictive Models Using Clou...

Restrict Access at IP Level Using Ranger and Knox

HBase Disaster Recovery Architecture Examples