Created 01-16-2025 12:06 AM
Hello all,
I'm trying to convert many records from json to parquet with ConvertRecord processor.
Most succeed, but convert fails for some files with this error.
org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: optional group pop_pools
I assume this is because some json files contain the following field.
"pop_pools": {}
As source record is log data, we can't modify it.
Is there any way to avoid this error and convert the records to parquet format?
Thanks,
Created 01-16-2025 10:28 PM
Hi @tono425 ,
You cannot write a empty struct in parquet.
This is due to the way the parquet format works, a parquet file only consists of leaf field data, the intermediate structure is not stored and can be inferred using the schema and the repetition levels and definition levels of the written leaf fields. An empty struct (which is written as a group) has no leaf fields and that is why parquet fails to write this, I would suggest to change the format or filter the value before converting.
Created 01-16-2025 10:28 PM
Hi @tono425 ,
You cannot write a empty struct in parquet.
This is due to the way the parquet format works, a parquet file only consists of leaf field data, the intermediate structure is not stored and can be inferred using the schema and the repetition levels and definition levels of the written leaf fields. An empty struct (which is written as a group) has no leaf fields and that is why parquet fails to write this, I would suggest to change the format or filter the value before converting.
Created 01-20-2025 12:20 AM
@cloude
Thank you for your answer.
Now I understand that is expected behavior.
I'll consider solution.
Thanks,