Member since
06-26-2015
509
Posts
136
Kudos Received
114
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1291 | 09-20-2022 03:33 PM | |
3803 | 09-19-2022 04:47 PM | |
2253 | 09-11-2022 05:01 PM | |
2340 | 09-06-2022 02:23 PM | |
3644 | 09-06-2022 04:30 AM |
10-27-2024
11:58 PM
1 Kudo
Cloudera’s Data In Motion Team is pleased to announce the release of the Cloudera Streaming Analytics - Kubernetes Operator 1.1, an integral component of Cloudera Streaming - Kubernetes Operator. This release includes improvements to SQL Stream Builder as well as updates to Apache Flink 1.19.1. Use Cases Event-Driven Applications: Stateful applications that ingest events from one (or more) event streams and react to incoming events by triggering computations, state updates, or external actions. Apache Flink excels in handling the concept of time and state for these applications and can scale to manage very large data volumes (up to several terabytes). It has a rich set of APIs, ranging from low-level controls to high-level functionality, like Flink SQL, enabling developers to choose the most suitable options for the implementation of advanced business logic. However, Apache Flink’s outstanding feature for event-driven applications is its support for savepoints. A savepoint is a consistent state image that can be used as a starting point for compatible applications. Given a savepoint, an application can be updated or adapt its scale, or multiple versions of an application can be started for A/B testing. Examples: Fraud detection Anomaly detection Rule-based alerting Business process monitoring Web application (social network) Data Analytics Applications: With a sophisticated stream processing engine, analytics can also be performed in real-time. Streaming queries or applications ingest real-time event streams and continuously produce and update results as events are consumed. The results are written to an external database or maintained as internal state. A dashboard application can read the latest results from the external database or directly query the internal state of the application. Apache Flink supports streaming as well as batch analytical applications. Examples: Quality monitoring of telco networks Analysis of product updates & experiment evaluation in mobile applications Ad-hoc analysis of live data in consumer technology Large-scale graph analysis Data Pipeline Applications: Streaming data pipelines serve a similar purpose as Extract-transform-load (ETL) jobs. They transform and enrich data and can move it from one storage system to another. However, they operate in a continuous streaming mode instead of being periodically triggered. Hence, they can read records from sources that continuously produce data and move it with low latency to their destination. Examples: Real-time search index building in e-commerce Continuous ETL in e-commerce Release Highlights Rebase to Apache Flink 1.19.1: Streaming analytics deployments, including SQL Stream Builder, now support Apache Flink 1.19.1, including the Apache Flink improvements below. For more information on improvements and deprecations, please check the Apache Flink 1.19 release announcement. Custom Parallelism for Table/SQL Sources: The DataGen connector now supports setting of custom parallelism for performance tuning via the scan.parallelism option. Support for other connectors will come in future releases. Configure Different State Time to Live (TTLs) Using SQL Hint: Users have now a more flexible way to specify custom time-to-live (TTL) values for state of regular joins and group aggregations directly within their queries by utilizing the STATE_TTL hint. Named Parameters: Named parameters can now be used when calling a function or stored procedure in Flink SQL. Support for SESSION Window Table-Valued Functions (TVFs) in Streaming Mode: Users can now use SESSION Window table-valued functions (TVF) in streaming mode. Support for Changelog Inputs for Window TVF Aggregation: Window aggregation operators can now handle changelog streams (e.g., Change Data Capture [CDC] data sources, etc.). New UDF Type: AsyncScalarFunction: The new AsyncScalarFunction is a user-defined asynchronous ScalarFunction that allows for issuing concurrent function calls asynchronously. MiniBatch Optimization for Regular Joins: The new mini-batch optimization can be used for regular join to reduce intermediate results, especially in cascading join scenarios. Dynamic Source Parallelism Inference for Batch Jobs: Allows source connectors to dynamically infer the parallelism based on the actual amount of data to consume. Standard Yet Another Markup Language (YAML) for Apache Flink Configuration: Apache Flink has officially introduced full support for the standard YAML 1.2 syntax in the configuration file. Profiling JobManager/TaskManager on Apache Flink Web: Support for triggering profiling at the JobManager/TaskManager level. New Config Options for Administrator Java Virtual Machine (JVM) Options: A set of administrator JVM options are available to prepend the user-set JVM options with default values for platform-wide JVM tuning. Using Larger Checkpointing Interval When Source is Processing Backlog: Users can set the execution.checkpointing.interval-during-backlog to use a larger checkpoint interval to enhance the throughput while the job is processing backlog if the source is backlog-aware. CheckpointsCleaner Clean Individual Checkpoint States in Parallel: Now, when disposing of no longer needed checkpoints, every state handle/state file will be disposed of in parallel for better performance. Trigger Checkpoints through Command Line Client: The command line interface supports triggering a checkpoint manually. New Interfaces to SinkV2 That Are Consistent with Source API. New Committer Metrics to Track the Status of Committables. Rebase to Apache Flink Kubernetes Operator 1.9.0: The highlights of this update are listed below. For more details on improvements please check the Apache Flink Kubernetes Operator 1.9.0 release announcement. Operator Optimizations: Reduce overall memory usage of the operator. High Availability: Fixed but causing unpredictable behaviour when changing watched namespaces in a HA setup. Autoscaler CPU And Memory Quotas: The user can now set CPU and memory quotas that the autoscaler will respect and won’t scale beyond that. Autoscaler Improvements: Several improvements for autoscaling are included in this release. Support for Python User-Defined Functions (UDFs) in SQL Stream Builder: The current Javascript UDFs in SQL Stream Builder will not work in Java 17 and later versions due to the deprecation and removal of the Nashorn engine from the Java Development Kit (JDK). The addition of Python UDFs to SQL Stream Builder will allow customers to use Python to create new UDFs that will continue to be supported on future JDKs. Javascript UDFs are being deprecated in this release and will be removed in a future release. Cloudera recommends that customers start using Python UDFs for all new development and start migrating their JavaScript UDFs to Python UDFs to prepare for future upgrades. Session cluster management in SQL Stream Builder: SQL Stream Builder now displays information about existing session clusters and allows for those clusters to be terminated from the UI. Connectors and formats included by default in the SQL Stream Builder image: The following connectors and formats are now included in the SQL Stream Builder's Docker image by default, enabling users to use them without having to build their own customized images. Connectors: Kafka JDBC CDC (MySQL, Oracle, Postgres, Db2, SqlServer) Amazon S3 Azure Blob Storage Google Cloud Storage Formats: JSON Avro ORC Parquet Global logging configuration for Configuring logs for all SSB jobs: A new global settings view enables default logging configurations to be set by the administrator. These settings will be applied to all streaming jobs by default and can be overridden at the job level. This ensures that a consistent logging standard can be applied by default for all users and developers. Please see the Release Notes for the complete list of fixes and improvements. Getting to the new release To upgrade to Cloudera Streaming Analytics Operator 1.1, check out this upgrade guide. Please note, if you are installing the operator for the first time use this installation overview. Public Resources New - What’s New in Cloudera Streaming Analytics Operator 1.1 Updated - Cloudera Streams Messaging Operator Documentation Updated - Cloudera Stream Processing Product Page Cloudera Kubernetes Operators documentation homepage Cloudera Stream Processing Community Edition Accelerate Streaming Pipeline Deployments with New Kubernetes Operators webinar recording Updated - Cloudera Stream Processing & Analytics Support Lifecycle Policy
... View more
Labels:
09-06-2024
03:25 AM
2 Kudos
Cloudera’s Data In Motion Team is pleased to announce the release of the Cloudera Streaming Messaging Operator 1.1, an integral component of Cloudera Streaming - Kubernetes Operator. With this release, customers receive Kafka Connect support and Kafka replication in the operator. Use Cases Loading and unloading data from Kafka: Kafka Connect gives Kafka users a simple way to get data quickly from a source and feed it to a Kafka topic. It also allows them to get data from a topic and copy it to an external destination. Adding Kafka Connect support to the operator gives our customers a tool for moving data in and out of Kafka, efficiently. Replicating data to other sites: Disaster resilience is an important aspect of any Kafka production deployment. The Cloudera Streaming Kubernetes Operator now supports configuring and running Kafka replication flows across any two Kafka clusters. These clusters could be in the same or in different data centers to provide increased resilience against disasters. Kafka migrations: Customers can migrate or replicate data between containerized Kafka clusters and on-prem or cloud-based clusters with the Cloudera Streaming Kubernetes Operator. Data can now be replicated in any direction and between two or more clusters at a time. Release Highlights Rebase on Strimzi 0.41.0: This release of Cloudera Streaming Messaging Operator has been rebased on Strimzi 0.41.0. For more information, see the Strimzi 0.41.0 Release Notes. Kafka Connect support: Deploy Kafka Connect clusters and Kafka connectors using KafkaConnect and KafkaConnector resources. For more information, see Deploying Kafka Connect clusters. Kafka replication support: Set up data replication between Kafka clusters using Cloudera Streams Messaging Operator. This allows users to: The operator uses a Kafka Connect-based approach for replication of Kafka data that is scalable, robust, and fault tolerant. For example, it supports the same key features as MirrorMaker 2. Replication of Kafka topic partitions to have multiple copies of the same data in different Kafka clusters to avoid data loss in case of data center failure. Replication of Kafka consumer group offsets to be able to failover between clusters without losing data. Ability to monitor your replication at any time. In addition, Kafka Connect-based replication has a number of advantages over MirrorMaker 2. These include: Single Messages Transforms (SMTs) can be configured for data replication. Manipulating source offsets is possible using the Kafka Connect REST API. Some replication architectures, like unidirectional replication, require less resources and Kafka Connect groups when using overrides for heartbeating For more information, see Replication Overview. For the complete list of fixes and improvements read these Release Notes . Getting to the new release To upgrade to Cloudera Stream Messaging Operator 1.1, check out this upgrade guide. Please note, if you are installing the operator for the first time use this installation overview. Public Resources New - What’s New in Cloudera Stream Operator 1.1 Updated - Cloudera Streams Messaging Operator Documentation Updated - Cloudera Stream Processing Product Page Cloudera Kubernetes Operators documentation homepage Cloudera Stream Processing Community Edition Accelerate Streaming Pipeline Deployments with New Kubernetes Operators webinar recording Updated - Cloudera Stream Processing & Analytics Support Lifecycle Policy
... View more
Labels:
08-19-2024
12:20 AM
1 Kudo
We are pleased to announce the release of Cloudera Streaming Analytics 1.13 for Cloudera Private Cloud Base 7.1.9 SP1. This release includes improvements to SQL Stream Builder as well as updates to Apache Flink 1.19.1. Use Cases Event-Driven Applications: Stateful applications that ingest events from one or more event streams and react to incoming events by triggering computations, state updates, or external actions. Apache Flink excels in handling the concept of time and state for these applications and can scale to manage very large data volumes (up to several terabytes) with exactly once consistency guarantees. Moreover, Apache Flink’s support for event-time, highly customizable window logic, and fine-grained control of time as provided by the ProcessFunction enable the implementation of advanced business logic. Moreover, Apache Flink features a library for Complex Event Processing (CEP) to detect patterns in data streams. However, Apache Flink’s outstanding feature for event-driven applications is its support for savepoints. A savepoint is a consistent state image that can be used as a starting point for compatible applications. Given a savepoint, an application can be updated or adapt its scale, or multiple versions of an application can be started for A/B testing. Examples: Fraud detection, Anomaly detection, Rule-based alerting, Business process monitoring, Web application (social network) Data Analytics Applications: With a sophisticated stream processing engine, analytics can also be performed in real-time. Streaming queries or applications ingest real-time event streams and continuously produce and update results as events are consumed. The results are written to an external database or maintained as internal state. A dashboard application can read the latest results from the external database or directly query the internal state of the application. Apache Flink supports streaming as well as batch analytical applications. Examples: Quality monitoring of telco networks, Analysis of product updates & experiment evaluation in mobile applications, Ad-hoc analysis of live data in consumer technology, Large-scale graph analysis Data Pipeline Applications: Streaming data pipelines serve a similar purpose as Extract-transform-load (ETL) jobs. They transform and enrich data and can move it from one storage system to another. However, they operate in a continuous streaming mode instead of being periodically triggered. Hence, they can read records from sources that continuously produce data and move it with low latency to their destination. Examples: Real-time search index building in e-commerce, Continuous ETL in e-commerce Release Highlights Rebase to Apache Flink 1.19.1: Streaming analytics deployments, including SQL Stream Builder, now support Apache Flink 1.19.1, which includes the Apache Flink improvements below. For more information on these improvements and deprecations, please check the Apache Flink 1.19.1 release announcement. Custom Parallelism for Table/SQL Sources: The DataGen connector now supports setting of custom parallelism for performance tuning via the scan.parallelism option. Support for other connectors will come in future releases. Configure Different State Time to Live (TTLs) Using SQL Hint: Users have now a more flexible way to specify custom time-to-live (TTL) values for state of regular joins and group aggregations directly within their queries by utilizing the STATE_TTL hint. Named Parameters: Named parameters can now be used when calling a function or stored procedure in Flink SQL. Support for SESSION Window Table-Valued Functions (TVFs) in Streaming Mode: Users can now use SESSION Window table-valued functions (TVF) in streaming mode. Support for Changelog Inputs for Window TVF Aggregation: Window aggregation operators can now handle changelog streams (e.g., Change Data Capture [CDC] data sources, etc.). New UDF Type: AsyncScalarFunction: The new AsyncScalarFunction is a user-defined asynchronous ScalarFunction that allows for issuing concurrent function calls asynchronously. MiniBatch Optimization for Regular Joins: The new mini-batch optimization can be used for regular join to reduce intermediate results, especially in cascading join scenarios. Dynamic Source Parallelism Inference for Batch Jobs: Allows source connectors to dynamically infer the parallelism based on the actual amount of data to consume. Standard Yet Another Markup Language (YAML) for Apache Flink Configuration: Apache Flink has officially introduced full support for the standard YAML 1.2 syntax in the configuration file. Profiling JobManager/TaskManager on Apache Flink Web: Support for triggering profiling at the JobManager/TaskManager level. New Config Options for Administrator Java Virtual Machine (JVM) Options: A set of administrator JVM options are available to prepend the user-set JVM options with default values for platform-wide JVM tuning. Using Larger Checkpointing Interval When Source is Processing Backlog: Users can set the execution.checkpointing.interval-during-backlog to use a larger checkpoint interval to enhance the throughput while the job is processing backlog if the source is backlog-aware. CheckpointsCleaner Clean Individual Checkpoint States in Parallel: Now, when disposing of no longer needed checkpoints, every state handle/state file will be disposed in parallel for better performance. Trigger Checkpoints through Command Line Client: The command line interface supports triggering a checkpoint manually. New Interfaces to SinkV2 That Are Consistent with Source API. New Committer Metrics to Track the Status of Committables. Support for Python User-Defined Functions (UDFs) in SQL Stream Builder: The current Javascript UDFs in SQL Stream Builder will not work in Java 17 and later versions due to the deprecation and removal of the Nashorn engine from the Java Development Kit (JDK). The addition of Python UDFs to SQL Stream Builder will allow customers to use Python to create new UDFs that will continue to be supported on future JDKs. Javascript UDFs are being deprecated in this release and will be removed in a future release. Cloudera recommends that customers start using Python UDFs for all new development and start migrating their JavaScript UDFs to Python UDFs to prepare for future upgrades. Note: Currently, Cloudera Streaming Analytics 1.13 only supports JDK versions 8 and 11. Global logging configuration for Configuring logs for all SSB jobs: A new global settings view enables default logging configurations to be set by the administrator. These settings will be applied to all streaming jobs by default and can be overridden at the job level. This ensures that a consistent logging standard can be applied by default for all users and developers. Please see the Release Notes for the complete list of fixes and improvements. Getting to the new release To upgrade to Cloudera Streaming Analytics 1.13, first ensure that your Cloudera Private Cloud Base environment is already upgraded to version 7.1.9 SP1 and then follow the instructions in the Cloudera Streaming Analytics upgrade guide. Resources New - What’s New in Cloudera Streaming Analytics 1.13 Updated - Cloudera Streaming Analytics Documentation Updated - Cloudera Stream Processing Product Page Cloudera Stream Processing Community Edition Accelerate Streaming Pipeline Deployments with New Kubernetes Operators webinar recording
... View more
Labels:
07-18-2024
07:30 AM
1 Kudo
We are excited to announce the release of the Cloudera Streaming Analytics Operator, as an integral component of Cloudera Streaming - Kubernetes Operator. The Cloudera Streaming Analytics Operator brings Apache Flink and Cloudera SQL Stream Builder to Kubernetes for our customers running Cloudera Streaming - Kubernetes Operator. This is a key release that adds strong streaming analytics capabilities to complement the Kafka features of the Operator. Cloudera SQL Stream Builder is available in Tech Preview in this release. While Kubernetes started out as a container orchestration platform where customers would mostly run stateless applications and APIs, it has now matured to a point where it can run stateful, large scale data processing frameworks such as Apache Flink and Apache Kafka. We have seen this shift in Kubernetes usage in our customer base and are excited to meet them where they are. Enabling Flink applications on Kubernetes is an important part of our vision and complements Cloudera's end-to-end Data in Motion story on Kubernetes. Use Case The Cloudera Streaming - Kubernetes Operator allows Flink and Kafka customers to deploy their workloads on existing shared Kubernetes clusters. The Kubernetes Operator automates the installation and lifecycle management of applications on Kubernetes, significantly lowering the cost of operations and allowing customers to focus on use case implementation instead of cluster operations. For customers, this means that they can take advantage of: Greater hybrid portability and accelerated deployments via streamlined life cycle management Increased resource efficiency via shared clusters Improved scalability via Kubernetes orchestration tools like OpenShift Release Highlights Cloudera Streaming Analytics Operator 1.0 (Flink and SQL Stream Builder) Deploys and manages Flink and SQL Stream Builder Supports Flink 1.18.1 SQL Stream Builder is available in Tech Preview (no support) Auto-scaling for Flink deployments PyFlink support Enterprise ready Prometheus integration for monitoring This adds to the the following Streaming Operator capabilities, released earlier: Cloudera Streaming Messaging Operator 1.0 (Kafka) Deploys and manages Kafka, Zookeeper, and Cruise Control Supports Kafka 3.7 Provides rack awareness and high availability Supports various authentication options (OAuth, LDAP, mTLS, PLAIN) Supports authorization through Kafka Access Control Lists (ACL) Prometheus integration for monitoring Learn More Watch the recording of our release webinar showcasing an end-to-end financial services use case running on the new Kubernetes Operators. Read the Operator documentation for detailed information about pre-requisites, installation and configuration
... View more
Labels:
09-12-2023
11:14 PM
1 Kudo
We are pleased to announce the general availability of Cloudera Streaming Analytics (CSA) 1.11 on CDP Private Cloud Base 7.1.9. This release includes improvements to SQL Stream Builder (SSB) as well as updates to Flink 1.16.2. These changes are focused on enhancing the user experience and fixing bugs, making the product more robust and stable. Sincere thanks to all the individuals who helped with this release and did an incredible job to get this ready. Key features for this release Rebase to Apache Flink 1.16.2 - Apache Flink 1.16.2 is now supported in CSA 1.11. Apache Iceberg support - Support for Apache Iceberg tables using Iceberg v2 format has been added to Flink and SSB. For more information, see the Creating Iceberg tables documentation. Links What's New in CSA 1.11 Documentation Iceberg Tables Iceberg Connector REST API v2 Reference BLOG: Building a Stateful Intrusion Detection System with SSB
... View more
Labels:
06-30-2023
02:31 AM
We are excited to announce the general availability of Cloudera Streaming Analytics (CSA) 1.10.0 on CDP Private Cloud Base. This release includes a massive set of improvements to SQL Stream Builder (SSB), including the addition of built-in widgets for data visualization, as well as a rebase to Flink 1.16. Some of the key features of this release are: Rebase to Apache Flink 1.16 - Apache Flink 1.16 is now supported in CSA 1.10. PyFlink Support - The Python API for Flink is now supported in CSA. Customers can now create Flink DataStream applications using Python, besides Java and Scala, to build scalable batch and streaming workloads like real-time data processing pipelines, large-scale exploratory data analysis, Machine Learning (ML) pipelines and ETL processes Built-in Widgets for Data Visualization - Built-in data visualization widgets have been added to the SQL Stream Builder (SSB) UI to provide a quick and simple way to visualize data from streaming jobs and materialized views in real-time. Built-in Support for Confluent Schema Registry - New catalog type in SSB to make it very easy to read and write data from Confluent Cloud clusters using their Schema Registry service. Flexible Schema Handling for Schema Registry catalogs - Cloudera Schema Registry catalog can now handle separate schemas for message key and payload. Useful Links Documentation Release notes NEW BLOG: Building a Stateful Intrusion Detection System with SSB Cloudera Stream Processing (CSP) Community Edition - Try SSB for free!
... View more
Labels:
03-09-2023
04:58 PM
The Cloudera Data in Motion (DiM) team is pleased to announce the general availability Cloudera Streaming Analytics (CSA) 1.9.0 on CDP Private Cloud Base 7.1.7 SP2 and 7.1.8. This release includes a massive set of improvements to SQL Stream Builder (SSB) as well as updates to Flink 1.15.1. These changes are focused on enhancing the user experience and removing objections and blockers in the sales cycle. All the features described below are already available in the Cloudera Stream Processing - Community Edition release, which is the fasted way for you to try them out for free. Links: Documentation Release notes CSP Community Edition Download and Install Blog - A UI That Makes You Want To Stream Blog - SQL Stream Builder Data Transformations Blog - Job Notifications in SQL Stream Builder Key features for this release: Reworked Streaming SQL Console: The User Interface (UI) of SQL Stream Builder (SSB), the Streaming SQL Console has been reworked with new design elements. Software Development Lifecycle (SDLC) support (Tech Preview): Projects are introduced as an organizational element for SQL Stream Builder that allows you to create and collaborate on SQL jobs throughout the SDLC stages with source control. For more information, see the Project structure and development documentation. Confluent Schema Registry support. Confluent Schema Registry can be used as a catalog in SQL Stream Builder and Flink. This unblocks the onboarding of customers that are using Confluent Kafka with Confluent Schema Registry. Improved REST API for SSB. Several new endpoints have been added to the API, making it easier to automate deployments to SSB and to integrate it with other applications. Updated CSP Community Edition. Community edition has been refreshed to include all these features including the revamped UI and SSB Projects and offers the fastest way for you to try out these new features. And, as usual, bug fixes, security patches, performance improvements, etc.
... View more
Labels:
10-09-2022
04:01 PM
@Althotta , I tested this on 1.16.2 and the behaviour you described doesn't happen to me. Would you be able to share you flow and processor/controller services configuration? Cheers, André
... View more
09-28-2022
03:04 AM
1 Kudo
You can get the id of the root process group and import the template there as well. André
... View more
09-28-2022
12:23 AM
1 Kudo
@Kushisabishii , Which version of NiFi are you using? There's an API endpoint for this: POST /process-groups/{id}/templates/upload Cheers, André
... View more