I'm working in with nifi to grab parquet files from a S3 bucket. But when I read in the parquet files the arrays in the data end up with the following format:
[
{
"id": 1,
"name": "John",
"address": {
"street": "Main St",
"city": "New York"
},
"hobbies": [
{
"element": "coding"
},
{
"element": "music"
}
],
"greetings": [
{
"element": {
"intro": "hello",
"end": "bye"
}
},
{
"element": {
"intro": "hola",
"end": "adios"
}
}
],
"gender": [
{
"element": "M"
}
],
"record_id": [
{
"element": "2a2c6c86947719eacc1742adf1d6f2c7"
}
]
}
]
Instead of the desired format:
[
{
"id": 1,
"name": "John",
"address": {
"street": "Main St",
"city": "New York"
},
"hobbies": [
"coding",
"music"
],
"greetings": [
{
"intro": "hello",
"end": "bye"
},
{
"intro": "hola",
"end": "adios"
}
],
"gender": [
"M"
],
"record_id": [
"2a2c6c86947719eacc1742adf1d6f2c7"
]
}
]
The downstream processes cannot be changed and cannot handle the arrays with the repeated 1D maps.
When I try to use a ConvertRecord processor to write the records out with a ParquetRecordSetWriter to get the arrays formatted correctly I get the following error:

There are a variety of fields that are arrays in the data so it's not feasible to specify handling for each array field. Is there some schema handling I can do with the ConvertRecord to avoid this error? It seems like it's writing the data out in the correct format and running into the schema conflict because it. Alternatively, is there a better way to handle nested data coming from parquet files?