What's New @ Cloudera

Find the latest Cloudera product news

[RELEASED] Cloudera Streaming Analytics - Kubernetes Operator 1.2

avatar
Super Guru

Release Highlights

  • Rebase on Flink 1.19.2
  • General availability for Cloudera SQL Stream Builder
    • Cloudera SQL Stream Builder, previously in Technical Preview, is now generally available in Cloudera Streaming Analytics - Kubernetes Operator.

      Cloudera SQL Stream Builder is a comprehensive interactive user interface for creating stateful stream processing jobs using SQL. For more information about SQL Stream Builder and its features, see the Getting started with SQL Stream Builder page.
  • LDAP authentication in SQL Stream Builder
    • LDAP-based authentication is available for Cloudera SQL Stream Builder. For more information, refer to LDAP authentication.
  • Custom truststores
    • You can now specify custom truststores when installing the Cloudera Streaming Analytics - Kubernetes Operator. For more information, refer to Security configurations.
  • Secure TLS connections to the SSB UI and API
    • Users can connect to Cloudera SQL Stream Builder and API via a TLS-encrypted channel. For more information, refer to Routing with ingress.
  • Using Python UDFs for table transformations and webhook connector
    • Due to the deprecation of Javascript User-defined Functions (UDFs) in Cloudera Streaming Analytics - Kubernetes Operator (see Deprecation notices), support in Python UDFs for table transformations and the webhook connector have been added to Cloudera SQL Stream Builder. For more information, refer to Creating Python User-defined Functions.
  • Configurable requests and limits for all resources

Please see the Release Notes for the complete list of fixes and improvements.

Getting to the New Release

To upgrade to Cloudera Streaming Analytics - Kubernetes Operator 1.2, check out this upgrade guide. If you are installing for the first time use this installation overview.

Use Cases

  • Event-Driven Applications: Stateful applications that ingest events from one-or more-event streams and react to incoming events by triggering computations, state updates, or external actions.

    Apache Flink excels in handling the concept of time and state for these applications, and can scale to manage very large data volumes (up to several terabytes). It has a rich set of APIs, ranging from low-level controls to high-level functionality, like Flink SQL, enabling developers to choose the most suitable options for the implementation of advanced business logic.


    Apache Flink’s outstanding feature for event-driven applications is its support for savepoints. A savepoint is a consistent state image that can be used as a starting point for compatible applications. Given a savepoint, an application can be updated or adapt its scale, or multiple versions of an application can be started for A/B testing.

    Examples:
    • Fraud detection
    • Anomaly detection
    • Rule-based alerting
    • Business process monitoring
    • Web application (social network)

 

  • Data Analytics Applications: With a sophisticated stream processing engine, analytics can be performed in real-time. Streaming queries or applications ingest real-time event streams and continuously produce and update results as events are consumed. The results are written to an external database or maintained as internal state. A dashboard application can read the latest results from the external database or directly query the internal state of the application.

    Apache Flink supports streaming as well as batch analytical applications.


    Examples:
    • Quality monitoring of telco networks
    • Analysis of product updates & experiment evaluation in mobile applications
    • Ad-hoc analysis of live data in consumer technology
    • Large-scale graph analysis

 

  • Data Pipeline Applications: Streaming data pipelines serve a similar purpose as Extract-Transform-Load (ETL) jobs. They transform and enrich data and can move it from one storage system to another. However, they operate in a continuous streaming mode instead of being periodically triggered. Hence, they can read records from sources that continuously produce data and move it with low latency to their destination.

    Examples:
    • Real-time search index building in e-commerce
    • Continuous ETL in e-commerce

 

Public Resources