Support Questions

Find answers, ask questions, and share your expertise

Impala Complex types: position in ARRAY of MAP

avatar

Hi,

 how can I access the position of the element (in my case of the MAP) in the array column?

 

TABLE DDL:

CREATE TABLE mytable (

    id int,

    location ARRAY< MAP< string, string> > )

 

When I query the table with this SQL I got a position of the KEY-VALUE pair INSIDE the MAP, but what I really want to access is the position in the location ARRAY, e.g. 1st location, 2nd location and so on.

 

select c.id, a.pos, a.key, a.value from mytable c left join c.locations a order by c.id, a.pos;

 

Thanks

 

 

1 ACCEPTED SOLUTION

avatar

Still investigating and working on the JIRA. In the meantimg, I think the query that you mean is this:

 

select t.id, l.pos as location_number, m.key, m.value from mytable t join t.locations l join l.item m order by id, l.pos;

 

The bug is that those queries returning wrong results are not semantically correct but Impala runs them anyway and gives "arbitrary" results.

View solution in original post

6 REPLIES 6

avatar

Hi Thomas,

 

in your example query the table alias 'a' has a 'pos' pseudo-column that refers to the element-position within the ARRAY, so I think it's indeed what you want.

 

However, I think there is a bug here that may lead to confusion.

The query you wrote "should" be illegal because you should not be able to access a.key and a.value without referencing the nested map in the FROM clause. I will follow-up with a JIRA, stay tuned.

 

Can you try running this and see if you get the expected results?

select c.id, a.pos from mytable c left join c.`location` a order by c.id, a.pos;

 

 

Alternatively try this query if you also want to explode the items within the nested MAP:

select c.id, a.pos, a.key, a.value from mytable c left join c.`location` a, a.item order by c.id, a.pos;

avatar

Hi,  

  thanks for the response, yes it seems to be a bug. You can test it yourself, here are the scripts to reproduce:

(in Hive, "i" is just a dummy table):

 

 

create table mytable( id int, locations array< map<string,string>>) stored as parquet;

insert into table mytable select 10001, array( map("Location1.City","Bratislava","Location1.Country","SK","Location1.LAT","41"), map("Location2.City","Kosice","Location2.Country","SK","Location2.LAT","42") ) from i limit 1;

insert into table mytable select 10002, array( map("Location1.City","Wien","Location1.Country","AT","Location1.LAT","40"), map("Location2.City","Graz","Location2.Country","AT","Location2.LAT","40") ) from i limit 1;

Query results in Impala:

 

sql1.PNGsql2.PNGsql3.PNG

 

These results are wrond, what I need is a location_number -

0 for Location1.City

0 for Location1.Country

0 for Location1.LAT

1 for Location2.City

1 for Location2.Country

1 for Location2.LAT

... and so on for every ID in the table.

 

What you suggested (adding l.item), returned a wrong result set, because City in the result set is twice, but LAT is missing - this seems to be definitely a BUG:

sql4.PNG

avatar

In Hive the solution is:

select id, ix, locations.key, locations.val from ( select id, ix, locs from mytable lateral view posexplode( locations ) x as ix,locs ) tab lateral view explode( locs ) locations as key,val;

which returns the correct result set:

sql5.PNG

avatar
Any update on this? Have you created the JIRA?
thanks

avatar

Still investigating and working on the JIRA. In the meantimg, I think the query that you mean is this:

 

select t.id, l.pos as location_number, m.key, m.value from mytable t join t.locations l join l.item m order by id, l.pos;

 

The bug is that those queries returning wrong results are not semantically correct but Impala runs them anyway and gives "arbitrary" results.

avatar

Filed the JIRA: https://issues.cloudera.org/browse/IMPALA-3938

 

Thanks a lot for your detailed report and easy reproduction!