Created on 02-03-2020 02:31 AM - edited on 02-11-2020 12:31 AM by VidyaSargur
In part 1 of this series we talked about the growing relevance of streaming technologies and covered the need for existing Cloudera customers currently using Apache Flume, to consider moving over to Cloudera DataFlow (CDF).
Cloudera DataFlow is an umbrella term that covers the streaming technologies from Cloudera. CDF is supported on CDH 5 / CDH 6 and HDP 2 / HDP 3. So there is nothing stopping customers adopting Cloudera DataFlow right now so that they are in a supported configuration for when they upgrade to the new Cloudera Data Platform (CDP).
CDF includes the technology to address a number of areas:
A good summary of these components can be found in the blog post Introducing Cloudera DataFlow (CDF).
Cloudera DataFlow - Data-In-Motion Platform
If you are a traditional Cloudera customer using the Cloudera Distribution of Apache Kafka, there are a number of new and exciting management technologies available via CDF. For example, the Cloudera Streams Management component includes:
However, Apache Flume has been replaced in CDF by Apache Nifi and MiNiFi. There are a number of benefits of using Apache Nifi / MiNiFi over Apache Flume:
Continuous data delivery, streaming applications and real-time analysis are becoming increasingly important and more widely adopted as part of a data architecture strategy. However, so is the need to adhere and comply with data regulation and protection laws such as GDPR in the EU and CCPA in California. This is why technologies such as Apache Nifi with graphical data pipelines and built-in support for data lineage and provenance provide a strong framework to work towards meeting regulatory compliance requirements.
One of the reasons that customers adopt Cloudera technology is because of the portfolio of technology that we offer all under a governed, secure and integrated data and analytics platform. This means that we can integrate and build differing streaming applications to address a variety of use cases. For example, Cloudera supports Apache HBase and Apache Kudu to use as the backend storage for real-time applications. In addition, Cloudera Machine Learning means that we can build predictive models and manage and deploy them into streaming applications. This is why we describe Cloudera as an end-to-end Edge2AI platform.
Created on 01-13-2021 07:10 AM
Here are a few examples of moving Flume flows to NiFi.
https://www.datainmotion.dev/2019/08/migrating-apache-flume-flows-to-apache.html
https://www.datainmotion.dev/2019/10/migrating-apache-flume-flows-to-apache.html