Community Articles
Find and share helpful community-sourced technical articles
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.
Cloudera Employee

Introduction

A few weeks ago, I published an article called Determining the big 5 traits of Personality Psychology of news articles using NiFi, Hive & Zeppelin. Since then, I have worked diligently to improve on this first iteration, with the objective to mock up at the heart of every company today: create an end-to-end platform that uses machine learning to not only generate insights, but keeps improving and feeding consumer applications.

While doing this work was a way for me to get into the needy-greedy of the latest Hortonworks tools, I decided to share it with the world in the form of a series of tutorial articles, because I believe it is a great way to get familiar with the stack.

Architecture overview

Luckily, the Hortonworks platform has all the elements needed to create this end-to-end platform. The figure below gives an overview of this series of articles architecture:

92821-screen-shot-2018-10-12-at-111244-am.png

As you can see, the goal of this platform is to:

  1. Ingest data from news articles (directly from the NYT API at first, then from other RSS feeds)
  2. Using Nifi and SAM, read the meta-data of the extracted articles, scrape their content, run personality recognition on their authors, then expose the result via Kafka for Druid consumption, directly pushing to HBase/Phoenix for offline analytics and "micro" services for consumer applications
  3. Generate real time insights on this computed data via Druid and Superset
  4. Enable Analytics & model training on the data stored in HBase using Zeppelin notebooks & Spark, that would then feed back the personality recognition modes
  5. Enable custom application to consumer the data extracted and analyzed

Agenda

This series of article will be composed of 4 parts:

864 Views
Don't have an account?
Coming from Hortonworks? Activate your account here
Version history
Revision #:
2 of 2
Last update:
‎08-17-2019 06:40 AM
Updated by:
 
Contributors
Top Kudoed Authors