Community Announcements

Find the latest community announcements

Join Us at CommunityOverCode Asia 2025: Exploring the Future of Apache Technologies

avatar
Community Manager

CommunityOverCode.pngCommunityOverCode Asia 2025 will take place from July 25-27, 2025, in Beijing. CommunityOverCode (formerly ApacheCon) is the official global conference series of the Apache Software Foundation (ASF). Through keynote, case studies, training, and hackathons, the conference showcases the latest developments and innovations across Apache projects. 

The Apache ecosystem's core value lies in open-source collaboration and community-driven innovation. Apache project contributors, committers, and technical experts from around the world will gather at CommunityOverCode Asia 2025 to share their latest technical insights and experiences in AI, big data, storage, analytics, and more.

Cloudera's open-source contributors will present across 4 different venues with 7 technical sessions, comprehensively demonstrating how Apache technology stack applications are transforming modern data processing. From data storage to analytics, we'll share how open-source technologies solve complex data challenges.

Below are the technical highlights we're bringing to the conference:

Session 1: Apache NiFi 2.0 - A New Era of Data Flow Processing

Yan Liu

Chinese Session 2025-07-27 14:30 GMT+8 (ROOM: Mtn BaiWang Hall)

As a contributor to Apache Hive and Apache Flink, Yan Liu will provide an in-depth introduction to Apache NiFi 2.0's new features. This presentation will focus on two core innovations: NiFi Cloud Native and NiFi Functions, exploring how these new capabilities enable more flexible and scalable data flow processing systems. Drawing from over 10 years of practical experience in big data, Yan Liu will share integration practices of NiFi with Apache Flink, Apache Hive, and Apache Iceberg in real-time data warehouse construction.

Session 2: Data Storage and Computing Infrastructure

Sammi Chen

Chinese Session 2025-07-26 15:45 GMT+8 (ROOM: Mtn WanShou Hall)

Sammi Chen will explore storage and computing infrastructure solutions within the Apache ecosystem. This presentation will analyze how to build distributed storage systems capable of supporting large-scale data processing, covering storage system architecture design, performance optimization, and scalability considerations. Through practical case studies, the session will demonstrate how to deploy and manage highly available storage computing clusters in enterprise environments, establishing solid infrastructure support for subsequent data analysis and processing.

Our data lake sessions comprehensively present the development and application of modern data lake technologies:

Session 3: Enterprise Data Lake Governance and Management

Bill Zhang

English Session 2025-07-25 14:00 GMT+8 (ROOM: WanChun Hall)

Bill Zhang will share governance frameworks and management best practices for enterprise data lakes. This presentation will explore security strategies, access control, and compliance requirements in data lake environments, demonstrating how to implement effective data governance policies in large-scale data environments.

Session 4: Apache Iceberg Metadata Optimization Techniques

Daniel Becker

English Session 2025-07-26 14:30 GMT+8 (ROOM: WanChun Hall)

Daniel Becker will analyze Apache Iceberg's metadata management mechanisms, focusing on how intelligent metadata processing improves query performance. The presentation will cover Iceberg's metadata table functionality, query optimization strategies, and table maintenance workflows, providing attendees with practical performance tuning guidelines.

Session 5: Apache Iceberg Table Lifecycle Management

Bill Zhang

English Session 2025-07-26 16:15 GMT+8 (ROOM: WanChun Hall)

Bill Zhang will comprehensively introduce Apache Iceberg table lifecycle management practices. This session will cover table creation, maintenance, optimization, and archival strategies, highlighting best practices for compression techniques, partition management, and schema evolution. Through these examples, the presentation will help audiences master efficient Iceberg table management methods in production environments.

Session 6: Advanced Data Lake Optimization and Operations

Attila Turóczy

English Session 2025-07-27 15:45 GMT+8 (ROOM: WanChun Hall)

Attila Turóczy will share advanced optimization techniques and operational experience for data lakes. The presentation will focus on cutting-edge performance tuning methods, monitoring system construction, and troubleshooting strategies, helping enterprises achieve excellence in data lake system operations.

Session 7: Apache Impala High-Performance Analytics Engine

Quanlong Huang

Chinese Session 2025-07-26 15:00 GMT+8 (ROOM: Mtn BaiWang Hall)

Quanlong Huang will introduce the latest technical advances of Apache Impala in OLAP and data analytics. This Chinese presentation will highlight Impala's query optimization techniques, distributed execution mechanisms, and integration practices with modern data lake architectures. The session will cover Impala's performance optimization strategies in large-scale data analysis scenarios, including query plan optimization, memory management, and concurrency control, providing technical guidance for enterprises to achieve high-performance real-time analytics.

Apache Technology Ecosystem Implementation

Our presentations form a complete Apache technology ecosystem, demonstrating end-to-end technical practices from data flow processing to high-performance analytics. This series of sessions not only showcases the unique advantages of individual Apache projects but, more importantly, reveals their collaborative effects in modern data architectures.

From a data flow processing perspective, Apache NiFi 2.0's Cloud Native and Functions capabilities provide enterprises with more flexible and scalable data ingestion and processing abilities. This cloud-native design philosophy enables data flow processing to better adapt to modern enterprises' distributed computing needs while simplifying complex data transformation logic implementation through functional programming models.

At the storage infrastructure, the Apache ecosystem provides powerful distributed storage solutions capable of supporting petabyte-scale data storage requirements. Modern enterprise data architectures need to handle structured, semi-structured, and unstructured data simultaneously. Apache's storage technology stack achieves seamless integration and efficient management of different data types through unified interfaces and API designs.

Data lake technology development represents an important trend in modern data architecture. Apache Iceberg, as the next-generation table format, addresses traditional data lake pain points in query performance, data consistency, and table management through its advanced metadata management mechanisms. From enterprise governance to technical implementation, from table lifecycle management to advanced optimization techniques, our data lake technology sessions comprehensively demonstrate how to build and operate enterprise-level modern data lake platforms.

Finally, Apache Impala serves as a high-performance analytics engine, providing powerful query and analysis capabilities for the entire data platform. Its distributed query execution, intelligent optimization, and deep integration with data lakes enable enterprises to achieve near real-time analytical responses on large-scale datasets, meeting modern business demands for immediate data insights.

These topics paint a picture of a complete data platform architecture based on Apache open-source technologies, showcasing how the open-source community provides solid technical foundations for enterprise digital transformation through technological innovation and collaboration.

We cordially invite you to join us on-site to explore the in-depth applications of these technical components and work together to build technology systems that adapt to future development.

Join us at CommunityOverCode Asia 2025

For more information, please visit: https://asia.communityovercode.org/

0 REPLIES 0