<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question ParquetReader incorrectly reading arrays in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/ParquetReader-incorrectly-reading-arrays/m-p/400760#M250958</link>
    <description>&lt;P&gt;I'm working in with nifi to grab parquet files from a S3 bucket. But when I read in the parquet files the arrays in the data end up with the following format:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;[
    {
        "id": 1,
        "name": "John",
        "address": {
            "street": "Main St",
            "city": "New York"
        },
        "hobbies": [
            {
                "element": "coding"
            },
            {
                "element": "music"
            }
        ],
        "greetings": [
            {
                "element": {
                    "intro": "hello",
                    "end": "bye"
                }
            },
            {
                "element": {
                    "intro": "hola",
                    "end": "adios"
                }
            }
        ],
        "gender": [
            {
                "element": "M"
            }
        ],
        "record_id": [
            {
                "element": "2a2c6c86947719eacc1742adf1d6f2c7"
            }
        ]
    }
]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Instead of the desired format:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;[
    {
        "id": 1,
        "name": "John",
        "address": {
            "street": "Main St",
            "city": "New York"
        },
        "hobbies": [
            "coding",
            "music"
        ],
        "greetings": [
            {
                "intro": "hello",
                "end": "bye"
            },
            {
                "intro": "hola",
                "end": "adios"
            }
        ],
        "gender": [
            "M"
        ],
        "record_id": [
            "2a2c6c86947719eacc1742adf1d6f2c7"
        ]
    }
]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The downstream processes cannot be changed and cannot handle the arrays with the repeated 1D maps.&lt;/P&gt;&lt;P&gt;When I try to use a ConvertRecord processor to write the records out with a ParquetRecordSetWriter to get the arrays formatted correctly I get the following error:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="schema_error.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/43596i1F7BC8FE68B5CBFC/image-size/medium?v=v2&amp;amp;px=400" role="button" title="schema_error.png" alt="schema_error.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;There are a variety of fields that are arrays in the data so it's not feasible to specify handling for each array field. Is there some schema handling I can do with the ConvertRecord to avoid this error? It seems like it's writing the data out in the correct format and running into the schema conflict because it. Alternatively, is there a better way to handle nested data coming from parquet files?&lt;/P&gt;</description>
    <pubDate>Mon, 20 Jan 2025 20:22:18 GMT</pubDate>
    <dc:creator>birdy</dc:creator>
    <dc:date>2025-01-20T20:22:18Z</dc:date>
    <item>
      <title>ParquetReader incorrectly reading arrays</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ParquetReader-incorrectly-reading-arrays/m-p/400760#M250958</link>
      <description>&lt;P&gt;I'm working in with nifi to grab parquet files from a S3 bucket. But when I read in the parquet files the arrays in the data end up with the following format:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;[
    {
        "id": 1,
        "name": "John",
        "address": {
            "street": "Main St",
            "city": "New York"
        },
        "hobbies": [
            {
                "element": "coding"
            },
            {
                "element": "music"
            }
        ],
        "greetings": [
            {
                "element": {
                    "intro": "hello",
                    "end": "bye"
                }
            },
            {
                "element": {
                    "intro": "hola",
                    "end": "adios"
                }
            }
        ],
        "gender": [
            {
                "element": "M"
            }
        ],
        "record_id": [
            {
                "element": "2a2c6c86947719eacc1742adf1d6f2c7"
            }
        ]
    }
]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Instead of the desired format:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="java"&gt;[
    {
        "id": 1,
        "name": "John",
        "address": {
            "street": "Main St",
            "city": "New York"
        },
        "hobbies": [
            "coding",
            "music"
        ],
        "greetings": [
            {
                "intro": "hello",
                "end": "bye"
            },
            {
                "intro": "hola",
                "end": "adios"
            }
        ],
        "gender": [
            "M"
        ],
        "record_id": [
            "2a2c6c86947719eacc1742adf1d6f2c7"
        ]
    }
]&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;The downstream processes cannot be changed and cannot handle the arrays with the repeated 1D maps.&lt;/P&gt;&lt;P&gt;When I try to use a ConvertRecord processor to write the records out with a ParquetRecordSetWriter to get the arrays formatted correctly I get the following error:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="schema_error.png" style="width: 400px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/43596i1F7BC8FE68B5CBFC/image-size/medium?v=v2&amp;amp;px=400" role="button" title="schema_error.png" alt="schema_error.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;There are a variety of fields that are arrays in the data so it's not feasible to specify handling for each array field. Is there some schema handling I can do with the ConvertRecord to avoid this error? It seems like it's writing the data out in the correct format and running into the schema conflict because it. Alternatively, is there a better way to handle nested data coming from parquet files?&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jan 2025 20:22:18 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ParquetReader-incorrectly-reading-arrays/m-p/400760#M250958</guid>
      <dc:creator>birdy</dc:creator>
      <dc:date>2025-01-20T20:22:18Z</dc:date>
    </item>
    <item>
      <title>Re: ParquetReader incorrectly reading arrays</title>
      <link>https://community.cloudera.com/t5/Support-Questions/ParquetReader-incorrectly-reading-arrays/m-p/400763#M250961</link>
      <description>&lt;P&gt;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/120128"&gt;@birdy&lt;/a&gt;&amp;nbsp;Welcome to the Cloudera Community!&lt;BR /&gt;&lt;BR /&gt;To help you get the best possible solution, I have tagged our NiFi experts&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/80381"&gt;@SAMSAL&lt;/a&gt;&amp;nbsp;&lt;a href="https://community.cloudera.com/t5/user/viewprofilepage/user-id/35454"&gt;@MattWho&lt;/a&gt;&amp;nbsp; who may be able to assist you further.&lt;BR /&gt;&lt;BR /&gt;Please keep us updated on your post, and we hope you find a satisfactory solution to your query.&lt;/P&gt;</description>
      <pubDate>Mon, 20 Jan 2025 22:21:17 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Support-Questions/ParquetReader-incorrectly-reading-arrays/m-p/400763#M250961</guid>
      <dc:creator>DianaTorres</dc:creator>
      <dc:date>2025-01-20T22:21:17Z</dc:date>
    </item>
  </channel>
</rss>

