Community Articles

Find and share helpful community-sourced technical articles.
Labels (1)
avatar
Contributor

Issue with different schema merging in Iceberg tables from Spark

During running a merge from Spark to an Iceberg table like this:

df.writeTo("tablename").append()

The following error would appear if the schema of DataFrame and Table are different:

Exception in thread "main" org.apache.spark.sql.AnalysisException: Cannot write incompatible data to table 'catalog.tablename':
- Cannot find data for output column 'column3'
- Cannot find data for output column 'column4'

This is happening because by default Iceberg doesn't accept different schema to be merged.

Problem solution

We need to add following table properties to our Iceberg table, to accept different schema merging:

ALTER TABLE tablename SET TBLPROPERTIES (
'write.spark.accept-any-schema'='true'
)

After that we need to add MergeSchema option true to our append command in Spark:

data.writeTo("tablename").option("mergeSchema","true").append()

With these changes Iceberg would accept this merge and will do it without any issue.

 

To read more about this behaviour here is the official documentation:

https://iceberg.apache.org/docs/1.6.0/spark-writes/#schema-merge

41 Views