Created on 01-05-2026 03:07 AM - edited 01-05-2026 07:51 AM
Regular compaction is essential for Apache Iceberg tables to address the issue of small files, which can accumulate as transaction volume increases, leading to degraded query performance. Iceberg offers the rewrite data files Spark procedure as a mechanism for performing this compaction.
The Cloudera Lakehouse Optimizer (CLO) is an automated management service designed to optimize Apache Iceberg tables, thereby improving performance and reducing operational costs.
It allows administrators to define maintenance policies for tables. The system evaluates these policies by gathering statistics from Iceberg metadata, which subsequently triggers maintenance actions. These actions include data compaction, snapshot expiration, and the removal of orphan files. It offers execution flexibility, supporting both schedule-based execution for predictable workloads and event-based triggers for real-time data changes.
Lakehouse Optimizer maintenance actions are executed on Spark. This blog focuses on the key parameters within the Lakehouse Optimizer policy that should be configured to enhance compaction performance.
The ClouderaAdaptive JEXL script is the default template provided by Lakehouse Optimizer. This script utilizes Iceberg metadata to gather table statistics and determine if an action should be initiated. To leverage this template for decision-making, a new policy must be created. Arguments for each action can then be supplied using a policy constant JSON. For comprehensive details on policy configuration, please refer to this documentation.
Consider an Apache Iceberg table with the following specifications:
Metric | Value |
Number of Partitions | 10 |
Number of Files | 363,775 |
Average File Size | 0.5 MB |
Total Size | 180.85 GB |
Number of Rows | 1.4 Billion |
Calculated Metrics:
The rewrite data file action is executed on this table using a Spark environment configured with a 16 GB executor size. We will now analyze two distinct parameter configurations.
Metric | Recommendation for the above table | Why? |
Max File Group Size | 3 GB | 1GB is too small ( many commits). 3GB creates groups of ~6,000 files, which is safe for 16GB RAM but efficient. |
Max Concurrent Groups | 5 - 10 | With smaller 3GB groups, we can safely increase concurrency to keep all the Spark cores busy. |
The rewrite_data_files action incorporates a feature called Partial Progress, which can be activated by setting the option partial-progress.enabled to true. When enabled, this feature facilitates incremental commitment of changes rather than deferring the commit until the entire job is complete.
The fundamental operational unit in a rewrite process is the File Group, which comprises a set of small files intended for consolidation.
Standard Behaviour (Default)
Under the standard configuration, the job processes all file groups before executing a single, final, large-scale commit.
Partial Progress Behaviour
When Partial Progress is enabled, the job processes a defined batch of file groups, commits these changes immediately, and subsequently proceeds to the next batch. partial-progress.max-commits determines the maximum number of commits the job will try to make. If you have 100 file groups and this is set to 10, each commit will contain roughly 10 file groups.
Rewrite data file action supports filtering data files based on an expression. For larger tables with too many small files compacting them could be time consuming and might cause OOM. Hence it can be split into multiple maintenance tasks by using the filter expression efficiently.
CLO supports setting the filter expression using table properties, for example for a Iceberg table ‘orders’ we can set the filter expression using,
ALTER TABLE default.orders SET TBLPROPERTIES ( 'dlm.rewriteDataFiles.where' = "day_of_week IN ('Mon', 'Fri')");To perform the rewriting of data files on a table using the Lakehouse Optimizer, a new policy must be created. This is done by utilizing the default ClouderaAdaptive template.The Lakehouse Optimizer provides REST APIs for uploading policy definitions, with detailed information available here.
The policy's action arguments are defined using a JSON structure. The goal here is to create a policy that:
It is worth noting that the ClouderaAdaptive templates support configuring the maximum rewrite file group concurrency and enabling partial progress.
{
"script": "dlm://tps:default/ClouderaAdaptive",
"cron" : "0 0/30 * * * ?",
"rewriteDataFiles" : {
"enabled": true,
"targetFileSize": 536870912,
"maxConcurrentRewriteFileGroups": 10,
"partialProgressEnabled": true,
"partialProgressMaxCommits": 10
}
}To configure the file group size, which is not currently supported by the ClouderaAdaptive template, the creation of a new policy template is requisite. This policy template is a JEXL file that contains the logic necessary for generating maintenance actions. Comprehensive information regarding policy resources is accessible here. The provided template is designed to simply generate a rewrite data action builder based upon the arguments supplied. This template can be uploaded utilising the REST APIs, during which a URI must be provided. Further specifics on the required Lakehouse Optimiser URI format are documented here.
let partialProgressEnabled = $constants.rewriteDataFiles.partialProgressEnabled ?? true;
let partialProgressMaxCommits = $constants.rewriteDataFiles.partialProgressMaxCommits ?? 10;
let maxConcurrentRewriteFileGroups = $constants.rewriteDataFiles.maxConcurrentRewriteFileGroups ?? 5;
// Default max file group size of 3GB
let maxFileGroupSize = $constants.rewriteDataFiles.maxFileGroupSize ?? 3221225472;
let targetFileSize = $constants.rewriteDataFiles.targetFileSize ?? 512*1024*1024;
return dlm:rewriteDataFiles($table)
.targetFileSize(targetFileSize)
.partialProgressEnabled(partialProgressEnabled)
.partialProgressMaxCommits(partialProgressMaxCommits)
.maxConcurrentRewriteFileGroups(maxConcurrentRewriteFileGroups)
.option('max-file-group-size-bytes', maxFileGroupSize)You should create a new policy that utilizes this template and includes the necessary arguments for rewriting data files.
{
"script": "dlm://tps/hive/ClouderaAdaptive",
"cron" : "0 0/30 * * * ?",
"rewriteDataFiles" : {
"enabled": true,
"targetFileSize": 536870912,
"maxConcurrentRewriteFileGroups": 10,
"maxFileGroupSize" : 3221225472,
"partialProgressEnabled": true,
"partialProgressMaxCommits": 10
}
}Compaction is a critical maintenance operation for optimizing Apache Iceberg query performance, particularly in tables with high transaction volumes resulting in many small files. As demonstrated, blindly increasing concurrency or file group size can lead to resource exhaustion, specifically Out-Of-Memory errors in Spark executors, when dealing with a high number of small files.
The key to successful and robust compaction lies in a balanced configuration:
By thoughtfully applying these parameter optimizations, users can transform the rewriteDataFiles procedure from a potential source of errors into an efficient, reliable, and performance-enhancing maintenance routine.
Reviewed by - Naveen Gangam, Adam Benlemlih, Dipankar Mazumdar, Henri Biestro, Prabhat Mishra