<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Impala bug with nested arrays of structures where some of the fields contains null in Support Questions</title>
    <link>https://community.cloudera.com/t5/Support-Questions/Impala-bug-with-nested-arrays-of-structures-where-some-of/m-p/78507#M12592</link>
    <description>&lt;DIV&gt;Hi All,&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;We found a case where Impala returns incorrect values from simple query. Our data contains nested array of structures and structures contains other structures.&lt;/DIV&gt;&lt;DIV&gt;We generated minimal sample data allowing to reproduce the issue.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;SQL to create a table:&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;CREATE TABLE plat_test.test_users (&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&amp;nbsp; id INT,&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&amp;nbsp; name STRING,&amp;nbsp; &amp;nbsp;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&amp;nbsp; devices ARRAY&amp;lt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&amp;nbsp; &amp;nbsp; STRUCT&amp;lt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; id:STRING,&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; device_info:STRUCT&amp;lt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; model:STRING&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&amp;nbsp; &amp;nbsp; &amp;gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;&amp;nbsp; &amp;gt;&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;)&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;STORED AS PARQUET&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Please put attached parquet file to the location of the table and refresh the table.&lt;/DIV&gt;&lt;DIV&gt;In sample data we have 2 users, one with 2 devices, second one with 3. Some of the devices.device_info.model fields are NULL.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;When I issue a query:&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;SELECT&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="http://u.name/" target="_blank"&gt;u.name&lt;/A&gt;, d.device_info.model as model&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;FROM test_users u,&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;u.devices d;&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I'm expecting to get 5 records in results, but getting only one&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="1.png" style="width: 545px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4761iE915712E88965565/image-size/large?v=v2&amp;amp;px=999" role="button" title="1.png" alt="1.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;DIV&gt;If I change query to:&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;SELECT&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="http://u.name/" target="_blank"&gt;u.name&lt;/A&gt;, d.device_info.model as model&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;FROM test_users u&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;LEFT OUTER JOIN u.devices d;&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I'm getting two records in the results, but still not as it should be:&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="2.png" style="width: 553px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4762iBFC7AE9CA2964983/image-size/large?v=v2&amp;amp;px=999" role="button" title="2.png" alt="2.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;We found some workaround to this problem. If we add to the result columns&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="http://device.id/" target="_blank"&gt;device.id&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;we will get all records from parquet file:&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;SELECT&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="http://u.name/" target="_blank"&gt;u.name&lt;/A&gt;,&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="http://d.id/" target="_blank"&gt;d.id&lt;/A&gt;, d.device_info.model as model&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;FROM test_users u&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;, u.devices d&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;And result is&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="3.png" style="width: 539px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4764i32CE6F19446C7014/image-size/large?v=v2&amp;amp;px=999" role="button" title="3.png" alt="3.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;But we can't rely on this workaround, because we don't need&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="http://device.id/" target="_blank"&gt;device.id&lt;/A&gt;&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;in all queries and Impala optimizes it, and as a result we are getting unpredicted results.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;I tested Hive query on this table and it returns expected results:&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;SELECT&lt;SPAN&gt;&amp;nbsp;&lt;/SPAN&gt;&lt;A href="http://u.name/" target="_blank"&gt;u.name&lt;/A&gt;, d.device_info.model&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;FROM test_users u&lt;/FONT&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;FONT face="monospace, monospace"&gt;lateral view outer inline (u.devices) d;&lt;/FONT&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;results:&lt;/DIV&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="4.png" style="width: 556px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/4763i54304DFF2616F74A/image-size/large?v=v2&amp;amp;px=999" role="button" title="4.png" alt="4.png" /&gt;&lt;/span&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;DIV&gt;Please advice if it's a problem in Impala engine or we did some mistake in our query.&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;A title="test_users_131786401297925138_0.parquet" href="https://drive.google.com/file/d/1ApPLVJpqebe8puuQZvU3m6857knIDq1L/view?usp=sharing" target="_blank"&gt;test_users_131786401297925138_0.parquet&lt;/A&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;Best regards,&lt;/DIV&gt;&lt;DIV&gt;Come2Play team.&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Fri, 16 Sep 2022 13:35:20 GMT</pubDate>
    <dc:creator>Yurii</dc:creator>
    <dc:date>2022-09-16T13:35:20Z</dc:date>
  </channel>
</rss>

