Created on
03-13-2026
01:34 PM
- last edited on
04-20-2026
11:10 PM
by
GrazittiAPI
The conversation around cloud adoption has matured significantly. It's no longer a question of if enterprises should use the cloud, but how they can strategically blend public cloud agility with the security and control of their on-premises infrastructure. This hybrid approach is now the dominant strategy for modern data-driven organisations.
Organisations employ two complementary strategies for hybrid cloud adoption: workload migration to cloud and cloud bursting.
While traditional migration involves the permanent relocation of applications and datasets to the cloud for modernisation, cloud bursting dynamically extends a private data center into a public cloud. This provides temporary, on-demand compute to handle demand spikes, scaling back down as capacity needs subside.
These two strategies co-exist. Migration is a long-term approach for modernising to cloud-native workloads, whereas bursting provides immediate compute elasticity for workloads that are retained on-premises, bypassing physical hardware procurement cycles.
Building and operating a hybrid estate introduces its own significant operational challenges.
Simply connecting an on premises data center to a public cloud doesn't create a true hybrid platform. Without a unified strategy, organisations quickly face:
To solve these problems, a platform must be built on a truly hybrid-native foundation.
At Cloudera, we believe a true hybrid cloud platform must deliver a seamless, unified experience. Our strategy is built on four key tenets:
In this blog, we will focus on Hybrid Environments and Data Hub, and how they work to enable seamless extension of on premises infrastructure to cloud.
Before detailing Cloudera Hybrid Data Hubs, it is essential to note the contrast with a “lift-and-shift” cloud migration architecture.
In this model, data and metadata are replicated from the on-premises environment (like HDFS) to cloud storage (like Amazon S3). Processing is then done entirely in the cloud using the replicated data.
While well-suited for replication, when applied to ephemeral cloud bursting, this architecture creates overhead from maintaining multiple data copies, adds complexity in ensuring data synchronization and consistency, and increases storage costs.
To natively enable cloud bursting, Cloudera is introducing Hybrid Environments and Data Hubs.
Cloudera Hybrid Environments and Data Hubs combine cloud-native elasticity, including provisioning and autoscaling, with a built-in capability to securely access datasets directly from an associated Cloudera on premises cluster.
To put this into context, a workload (e.g., Spark) submitted to the Data Hub reads/writes data and metadata directly from the associated Cloudera on-premises cluster’s storage (e.g., HDFS), all authorised, audited, and governed by Cloudera SDX.
Cloudera Hybrid Data Hub deployment architecture has the following building blocks:
Hybrid Data Hub allows you to operate with the agility of the cloud while leveraging your existing infrastructure through the following key advantages.
In addition to being a native architecture for cloud bursting, this also unlocks other powerful applications for your business inter-alia:
We now move from theory to practice. While in-place data access eliminates the need for expensive and complex maintenance of persistent data copies for ephemeral cloud bursts, the performance varies based on infrastructure (such as network bandwidth and latency) and the specific workload profile.
We have conducted comparative benchmarking for Spark SQL workloads at enterprise scale for Hybrid Data Hubs to determine viability and discover significant infrastructure and workload factors influencing performance.
Full text of the performance benchmark can be viewed here.
The benchmarking exercise establishes how network bandwidth, file format, and compression settings affect performance in hybrid cloud environments where compute runs in the cloud and data remains on-premise.
Overall, the strategic use of columnar formats and compression enables many workloads to run efficiently in hybrid environments, even with limited network capacity.
For CPU-intensive Spark jobs, this setup can be a viable architecture for burst-to-cloud use cases. In contrast, I/O-intensive jobs remain highly sensitive to network limits, making this approach less suitable for data-heavy pipelines without further optimisation.
Get started with Hybrid Data Hub setup to natively burst on-premises workloads to cloud without creating data copies or rewriting applications.