Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
avatar
Rising Star

I'm back with a new article series like I did previously with news author personality recognition, beast mode quotient, and AI to edge (though this one got recently replaced with identifying Magic: The Gathering cards).

 

In this series, I will showcase how to harness the true power of Cloudera Data Platform (CDP) Hybrid cloud capabilities. Throughout the series you will learn how to use CDP Private Cloud Base, Replication Manager, CDP Public Cloud, Nifi, Kafka on data hub, Cloudera Data Warehouse, and Cloudera Viz.

Reminder: CDP Vision

CDP is designed to seamlessly enable you to deploy any data workloads (data collection, streaming, enrichment, engineering, serving, and AI/ML), on any infrastructure, with the latest engines while maintaining a coherent layer of security and governance (SDX).

 

Screen Shot 2020-10-31 at 4.20.48 PM.png

Case Study: Worldwide Bank

For the purpose of this article, I will use an example of a fake bank (Worldwide Bank).

 

Worldwide Bank is a large international bank that leverages a traditional big data architecture on-premises (CDP PvC Base) for data engineering and data warehousing over petabytes of data.

 

With COVID-19 taking the world through unprecedented times, competition is at its highest, accelerating its data organization through their adoption of the latest technologies and architectures, especially cloud infrastructures.

 

Their first use case on this new technology platform is to create a visual report assessing the risk of every one of its branches as the virus spreads.

 

The implementation of this first use case has the following critical considerations:

  • Speed of implementation/cloud adoption
  • Maintenance of data privacy/security standards
  • Re-use of current team skillset (i.e. portability)

Implementation Architecture

After carefully considering options, the bank selected CDP as their hybrid architecture as it satisfies all their needs. Specifically, here is their implementation design:

Screen Shot 2020-10-31 at 4.31.59 PM.png

This article series will guide you through these four steps:

  1. Replicate bank branches and employee data (Replication Manager, Cloudera Manager, S3, HDFS).
  2. Profile sensitive data and apply data protection (Data Catalog profilers, Atlas, Ranger).
  3. Enrich data by streaming COVID statistics (Nifi).
  4. Create interactive visual reports (Cloudera Data Warehouse, Hive LLAP, Viz).

 

Note: all assets for this series can be found here.

 

1,778 Views