Support Questions

Find answers, ask questions, and share your expertise

ConvertRecord fails for some files

avatar
Contributor

Hello all,

I'm trying to convert many records from json to parquet with ConvertRecord processor.
Most succeed, but convert fails for some files with this error.

 

org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: optional group pop_pools

 

 I assume this is because some json files contain the following field.

 

"pop_pools": {}

 

As source record is log data, we can't modify it.
Is there any way to avoid this error and convert the records to parquet format?

Thanks,

1 ACCEPTED SOLUTION

avatar
Contributor

Hi @tono425 ,

You cannot write a empty struct in parquet.

This is due to the way the parquet format works, a parquet file only consists of leaf field data, the intermediate structure is not stored and can be inferred using the schema and the repetition levels and definition levels of the written leaf fields. An empty struct (which is written as a group) has no leaf fields and that is why parquet fails to write this, I would suggest to change the format or filter the value before converting.

View solution in original post

2 REPLIES 2

avatar
Contributor

Hi @tono425 ,

You cannot write a empty struct in parquet.

This is due to the way the parquet format works, a parquet file only consists of leaf field data, the intermediate structure is not stored and can be inferred using the schema and the repetition levels and definition levels of the written leaf fields. An empty struct (which is written as a group) has no leaf fields and that is why parquet fails to write this, I would suggest to change the format or filter the value before converting.

avatar
Contributor

@cloude 
Thank you for your answer.
Now I understand that is expected behavior.
I'll consider solution.

Thanks,